For loops are a flow control method for repeating a task or set of tasks over a domain. The core structure of a for loop is
for ( [index] in [domain]){
[body]
}
Where
[index]
is a name takes exactly one value of [domain]
over each iteration of the loop.[domain]
is a vector of values over which to iterate.[body]
is the set of instructions to apply on each iteration.As a trivial example, consider the use of a for loop to obtain the cumulative sum of a vector of values.
x <- 1:4
cumulative_sum <- 0
for (i in x){
cumulative_sum <- cumulative_sum + x[i]
}
cumulative_sum
For loops can be useful for conceptualizing and executing tasks to repeat. If not constructed carefully, however, they can be very slow to execute compared to the preferred used of the apply
family of functions. Nonetheless, there are a handful of elements you can include in your for loop construction to optimize the loop. In many cases, good construction of the for loop will yield computational efficiency very close to that of an apply function.
A 'properly constructed' for loop builds on the core structure and includes a statement declaring the object that will capture each iteration of the loop. This object should have both a class and a length declared.
[output] <- [vector_of_length]
for ([index] in [length_safe_domain]){
[output][index] <- [body]
}
To illustrate, let us write a loop to square each value in a numeric vector (this is a trivial example for illustration only. The 'correct' way of completing this task would be x_squared <- x^2
).
x <- 1:100
x_squared <- vector("numeric", length = length(x))
for (i in seq_along(x)){
x_squared[i] <- x[i]^2
}
Again, notice that we first declared a receptacle for the output x_squared
, and gave it the class "numeric" with the same length as x
. Additionally, we declared a "length safe domain" using the seq_along
function. seq_along
generates a vector of indices for an object that is suited for use in for loops. While it seems intuitive to use for (i in 1:length(x))
, if x
has 0 length, the loop will attempt to iterate over the domain of 1:0
, resulting in an error (the 0th index is undefined in R).
Receptacle objects and length safe domains are handled internally by the apply
family of functions and users are encouraged to adopt the apply
approach in place of for loops as much as possible. However, if properly constructed, a for loop may occasionally provide greater code clarity with minimal loss of efficiency.
For loops can often be a useful tool in conceptualizing the tasks that need to be completed within each iteration. When the loop is completely developed and conceptualized, there may be advantages to turning the loop into a function.
In this example, we will develop a for loop to calculate the mean of each column in the mtcars
dataset (again, a trivial example as it could be accomplished via the colMeans
function).
column_mean_loop <- vector("numeric", length(mtcars))
for (k in seq_along(mtcars)){
column_mean_loop[k] <- mean(mtcars[[k]])
}
The for loop can be converted to an apply function by rewriting the body of the loop as a function.
col_mean_fn <- function(x) mean(x)
column_mean_apply <- vapply(mtcars, col_mean_fn, numeric(1))
And to compare the results:
identical(column_mean_loop,
unname(column_mean_apply)) #* vapply added names to the elements
#* remove them for comparison
The advantages of the vectorized form is that we were able to eliminate a few lines of code. The mechanics of determining the length and type of the output object and iterating over a length safe domain are handled for us by the apply function. Additionally, the apply function is a little bit faster than the loop. The difference of speed is often negligible in human terms depending on the number of iterations and the complexity of the body.