# R Language Optimal Construction of a For Loop

## Example

To illustrate the effect of good for loop construction, we will calculate the mean of each column in four different ways:

1. Using a poorly optimized for loop
2. Using a well optimized for for loop
3. Using an `*apply` family of functions
4. Using the `colMeans` function

Each of these options will be shown in code; a comparison of the computational time to execute each option will be shown; and lastly a discussion of the differences will be given.

## Poorly optimized for loop

``````column_mean_poor <- NULL
for (i in 1:length(mtcars)){
column_mean_poor[i] <- mean(mtcars[[i]])
}
``````

## Well optimized for loop

``````column_mean_optimal <- vector("numeric", length(mtcars))
for (i in seq_along(mtcars)){
column_mean_optimal <- mean(mtcars[[i]])
}
``````

## `vapply` Function

``````column_mean_vapply <- vapply(mtcars, mean, numeric(1))
``````

## `colMeans` Function

``````column_mean_colMeans <- colMeans(mtcars)
``````

## Efficiency comparison

The results of benchmarking these four approaches is shown below (code not displayed)

``````Unit: microseconds
expr     min       lq     mean   median       uq     max neval  cld
poor 240.986 262.0820 287.1125 275.8160 307.2485 442.609   100    d
optimal 220.313 237.4455 258.8426 247.0735 280.9130 362.469   100   c
vapply 107.042 109.7320 124.4715 113.4130 132.6695 202.473   100 a
colMeans 155.183 161.6955 180.2067 175.0045 194.2605 259.958   100  b
``````

Notice that the optimized `for` loop edged out the poorly constructed for loop. The poorly constructed for loop is constantly increasing the length of the output object, and at each change of the length, R is reevaluating the class of the object.

Some of this overhead burden is removed by the optimized for loop by declaring the type of output object and its length before starting the loop.

In this example, however, the use of an `vapply` function doubles the computational efficiency, largely because we told R that the result had to be numeric (if any one result were not numeric, an error would be returned).

Use of the `colMeans` function is a touch slower than the `vapply` function. This difference is attributable to some error checks performed in `colMeans` and mainly to the `as.matrix` conversion (because `mtcars` is a `data.frame`) that weren't performed in the `vapply` function.