R Language Use anonymous functions with apply


apply is used to evaluate a function (maybe an anonymous one) over the margins of an array or matrix.

Let's use the iris dataset to illustrate this idea. The iris dataset has measurements of 150 flowers from 3 species. Let's see how this dataset is structured:

> head(iris)

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1         5.1          3.5          1.4         0.2  setosa
2         4.9          3.0          1.4         0.2  setosa
3         4.7          3.2          1.3         0.2  setosa
4         4.6          3.1          1.5         0.2  setosa
5         5.0          3.6          1.4         0.2  setosa
6         5.4          3.9          1.7         0.4  setosa

Now, imagine that you want to know the mean of each of these variables. One way to solve this might be to use a for loop, but R programmers will often prefer to use apply (for reasons why, see Remarks):

> apply(iris[1:4], 2, mean)

Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    5.843333     3.057333     3.758000     1.199333
  • In the first parameter, we subset iris to include only the first 4 columns, because mean only works on numeric data.
  • The second parameter value of 2 indicates that we want to work on the columns only (the second subscript of the r×c array); 1 would give the row means.

In the same way we can calculate more meaningful values:

# standard deviation
apply(iris[1:4], 2, sd)
# variance
apply(iris[1:4], 2, var)

Caveat: R has some built-in functions which are better for calculating column and row sums and means: colMeans and rowMeans.

Now, let's do a different and more meaningful task: let's calculate the mean only for those values which are bigger than 0.5. For that, we will create our own mean function.

> our.mean.function <- function(x) { mean(x[x > 0.5]) }
> apply(iris[1:4], 2, our.mean.function)

Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    5.843333     3.057333     3.758000     1.665347

(Note the difference in the mean of Petal.Width)

But, what if we don't want to use this function in the rest of our code? Then, we can use an anonymous function, and write our code like this:

apply(iris[1:4], 2, function(x) { mean(x[x > 0.5]) })

So, as we have seen, we can use apply to execute the same operation on columns or rows of a dataset using only one line.

Caveat: Since apply returns very different kinds of output depending on the length of the results of the specified function, it may not be the best choice in cases where you are not working interactively. Some of the other *apply family functions are a bit more predictable (see Remarks).