R Language Preprocessing


Pre-processing in caret is done through the preProcess() function. Given a matrix or data frame type object x, preProcess() applies transformations on the training data which can then be applied to testing data.

The heart of the preProcess() function is the method argument. Method operations are applied in this order:

  1. Zero-variance filter
  2. Near-zero variance filter
  3. Box-Cox/Yeo-Johnson/exponential transformation
  4. Centering
  5. Scaling
  6. Range
  7. Imputation
  8. PCA
  9. ICA
  10. Spatial Sign

Below, we take the mtcars data set and perform centering, scaling, and a spatial sign transform.

auto_index <- createDataPartition(mtcars$mpg, p = .8,
                                  list = FALSE,
                                  times = 1)

mt_train <- mtcars[auto_index,]
mt_test <- mtcars[-auto_index,]

process_mtcars <- preProcess(mt_train, method = c("center","scale","spatialSign"))

mtcars_train_transf <- predict(process_mtcars, mt_train)
mtcars_test_tranf <- predict(process_mtcars,mt_test)