# R Language Linear Models (Regression) Using the 'predict' function

## Example

Once a model is built `predict` is the main function to test with new data. Our example will use the `mtcars` built-in dataset to regress miles per gallon against displacement:

``````my_mdl <- lm(mpg ~ disp, data=mtcars)
my_mdl

Call:
lm(formula = mpg ~ disp, data = mtcars)

Coefficients:
(Intercept)         disp
29.59985     -0.04122
``````

If I had a new data source with displacement I could see the estimated miles per gallon.

``````set.seed(1234)
newdata <- sample(mtcars\$disp, 5)
newdata
 258.0  71.1  75.7 145.0 400.0

newdf <- data.frame(disp=newdata)
predict(my_mdl, newdf)
1        2        3        4        5
18.96635 26.66946 26.47987 23.62366 13.11381
``````

The most important part of the process is to create a new data frame with the same column names as the original data. In this case, the original data had a column labeled `disp`, I was sure to call the new data that same name.

Caution

Let's look at a few common pitfalls:

1. not using a data.frame in the new object:

``````predict(my_mdl, newdata)
Error in eval(predvars, data, env) :
numeric 'envir' arg not of length one
``````
2. not using same names in new data frame:

``````newdf2 <- data.frame(newdata)
predict(my_mdl, newdf2)
``````

Accuracy

To check the accuracy of the prediction you will need the actual y values of the new data. In this example, `newdf` will need a column for 'mpg' and 'disp'.

``````newdf <- data.frame(mpg=mtcars\$mpg[1:10], disp=mtcars\$disp[1:10])
#     mpg  disp
# 1  21.0 160.0
# 2  21.0 160.0
# 3  22.8 108.0
# 4  21.4 258.0
# 5  18.7 360.0
# 6  18.1 225.0
# 7  14.3 360.0
# 8  24.4 146.7
# 9  22.8 140.8
# 10 19.2 167.6

p <- predict(my_mdl, newdf)

#root mean square error
sqrt(mean((p - newdf\$mpg)^2, na.rm=TRUE))
 2.325148
