`apply`

is used to evaluate a function (maybe an anonymous one) over the margins of an array or matrix.

Let's use the `iris`

dataset to illustrate this idea. The `iris`

dataset has measurements of 150 flowers from 3 species. Let's see how this dataset is structured:

```
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
```

Now, imagine that you want to know the mean of *each* of these variables. One way to solve this might be to use a `for`

loop, but R programmers will often prefer to use `apply`

(for reasons why, see Remarks):

```
> apply(iris[1:4], 2, mean)
Sepal.Length Sepal.Width Petal.Length Petal.Width
5.843333 3.057333 3.758000 1.199333
```

- In the first parameter, we subset
`iris`

to include only the first 4 columns, because`mean`

only works on numeric data. - The second parameter value of
`2`

indicates that we want to work on the columns only (the second subscript of the r×c array);`1`

would give the row means.

In the same way we can calculate more meaningful values:

```
# standard deviation
apply(iris[1:4], 2, sd)
# variance
apply(iris[1:4], 2, var)
```

**Caveat**: R has some built-in functions which are better for calculating column and row sums and means: `colMeans`

and `rowMeans`

.

Now, let's do a different and more meaningful task: let's calculate the mean *only* for those values which are bigger than `0.5`

. For that, we will create our own `mean`

function.

```
> our.mean.function <- function(x) { mean(x[x > 0.5]) }
> apply(iris[1:4], 2, our.mean.function)
Sepal.Length Sepal.Width Petal.Length Petal.Width
5.843333 3.057333 3.758000 1.665347
```

*(Note the difference in the mean of Petal.Width)*

But, what if we don't want to use this function in the rest of our code? Then, we can use an anonymous function, and write our code like this:

```
apply(iris[1:4], 2, function(x) { mean(x[x > 0.5]) })
```

So, as we have seen, we can use `apply`

to execute the same operation on columns or rows of a dataset using only one line.

**Caveat**: Since `apply`

returns very different kinds of output depending on the length of the results of the specified function, it may not be the best choice in cases where you are not working interactively. Some of the other `*apply`

family functions are a bit more predictable (see Remarks).

This modified text is an extract of the original Stack Overflow Documentation created by following contributors and released under CC BY-SA 3.0

This website is not affiliated with Stack Overflow