R Language Optimal Construction of a For Loop


Example

To illustrate the effect of good for loop construction, we will calculate the mean of each column in four different ways:

  1. Using a poorly optimized for loop
  2. Using a well optimized for for loop
  3. Using an *apply family of functions
  4. Using the colMeans function

Each of these options will be shown in code; a comparison of the computational time to execute each option will be shown; and lastly a discussion of the differences will be given.

Poorly optimized for loop

column_mean_poor <- NULL
for (i in 1:length(mtcars)){
  column_mean_poor[i] <- mean(mtcars[[i]])
}

Well optimized for loop

column_mean_optimal <- vector("numeric", length(mtcars))
for (i in seq_along(mtcars)){
  column_mean_optimal <- mean(mtcars[[i]])
}

vapply Function

column_mean_vapply <- vapply(mtcars, mean, numeric(1))

colMeans Function

column_mean_colMeans <- colMeans(mtcars)

Efficiency comparison

The results of benchmarking these four approaches is shown below (code not displayed)

Unit: microseconds
     expr     min       lq     mean   median       uq     max neval  cld
     poor 240.986 262.0820 287.1125 275.8160 307.2485 442.609   100    d
  optimal 220.313 237.4455 258.8426 247.0735 280.9130 362.469   100   c 
   vapply 107.042 109.7320 124.4715 113.4130 132.6695 202.473   100 a   
 colMeans 155.183 161.6955 180.2067 175.0045 194.2605 259.958   100  b

Notice that the optimized for loop edged out the poorly constructed for loop. The poorly constructed for loop is constantly increasing the length of the output object, and at each change of the length, R is reevaluating the class of the object.

Some of this overhead burden is removed by the optimized for loop by declaring the type of output object and its length before starting the loop.

In this example, however, the use of an vapply function doubles the computational efficiency, largely because we told R that the result had to be numeric (if any one result were not numeric, an error would be returned).

Use of the colMeans function is a touch slower than the vapply function. This difference is attributable to some error checks performed in colMeans and mainly to the as.matrix conversion (because mtcars is a data.frame) that weren't performed in the vapply function.