The base package parallel
allows parallel computation through forking, sockets, and random-number generation.
Detect the number of cores present on the localhost:
parallel::detectCores(all.tests = FALSE, logical = TRUE)
Create a cluster of the cores on the localhost:
parallelCluster <- parallel::makeCluster(parallel::detectCores())
First, a function appropriate for parallelization must be created. Consider the mtcars
dataset. A regression on mpg
could be improved by creating a separate regression model for each level of cyl
.
data <- mtcars
yfactor <- 'cyl'
zlevels <- sort(unique(data[[yfactor]]))
datay <- data[,1]
dataz <- data[,2]
datax <- data[,3:11]
fitmodel <- function(zlevel, datax, datay, dataz) {
glm.fit(x = datax[dataz == zlevel,], y = datay[dataz == zlevel])
}
Create a function that can loop through all the possible iterations of zlevels
. This is still in serial, but is an important step as it determines the exact process that will be parallelized.
fitmodel <- function(zlevel, datax, datay, dataz) {
glm.fit(x = datax[dataz == zlevel,], y = datay[dataz == zlevel])
}
for (zlevel in zlevels) {
print("*****")
print(zlevel)
print(fitmodel(zlevel, datax, datay, dataz))
}
Curry this function:
worker <- function(zlevel) {
fitmodel(zlevel,datax, datay, dataz)
}
Parallel computing using parallel
cannot access the global environment. Luckily, each function creates a local environment parallel
can access. Creation of a wrapper function allows for parallelization. The function to be applied also needs to be placed within the environment.
wrapper <- function(datax, datay, dataz) {
# force evaluation of all paramters not supplied by parallelization apply
force(datax)
force(datay)
force(dataz)
# these variables are now in an enviroment accessible by parallel function
# function to be applied also in the environment
fitmodel <- function(zlevel, datax, datay, dataz) {
glm.fit(x = datax[dataz == zlevel,], y = datay[dataz == zlevel])
}
# calling in this environment iterating over single parameter zlevel
worker <- function(zlevel) {
fitmodel(zlevel,datax, datay, dataz)
}
return(worker)
}
Now create a cluster and run the wrapper function.
parallelcluster <- parallel::makeCluster(parallel::detectCores())
models <- parallel::parLapply(parallelcluster,zlevels,
wrapper(datax, datay, dataz))
Always stop the cluster when finished.
parallel::stopCluster(parallelcluster)
The parallel
package includes the entire apply()
family, prefixed with par
.