Closely correlated features may add variance to your model, and removing one of a correlated pair might help reduce that. There are lots of ways to detect correlation. Here's one:
library(purrr) # in order to use keep() # select correlatable vars toCorrelate<-mtcars %>% keep(is.numeric) # calculate correlation matrix correlationMatrix <- cor(toCorrelate) # pick only one out of each highly correlated pair's mirror image correlationMatrix[upper.tri(correlationMatrix)]<-0 # and I don't remove the highly-correlated-with-itself group diag(correlationMatrix)<-0 # find features that are highly correlated with another feature at the +- 0.85 level apply(correlationMatrix,2, function(x) any(abs(x)>=0.85)) mpg cyl disp hp drat wt qsec vs am gear carb TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
I'll want to look at what MPG is correlated to so strongly, and decide what to keep and what to toss. Same for cyl and disp. Alternatively, I might need to combine some strongly correlated features.