R Language Linear Models (Regression) Checking for nonlinearity with polynomial regression


Example

Sometimes when working with linear regression we need to check for non-linearity in the data. One way to do this is to fit a polynomial model and check whether it fits the data better than a linear model. There are other reasons, such as theoretical, that indicate to fit a quadratic or higher order model because it is believed that the variables relationship is inherently polynomial in nature.

Let's fit a quadratic model for the mtcars dataset. For a linear model see Linear regression on the mtcars dataset.

First we make a scatter plot of the variables mpg (Miles/gallon), disp (Displacement (cu.in.)), and wt (Weight (1000 lbs)). The relationship among mpg and disp appears non-linear.

plot(mtcars[,c("mpg","disp","wt")])

enter image description here

A linear fit will show that disp is not significant.

fit0 = lm(mpg ~ wt+disp, mtcars)
summary(fit0)

# Coefficients:
#            Estimate Std. Error t value Pr(>|t|)    
#(Intercept) 34.96055    2.16454  16.151 4.91e-16 ***
#wt          -3.35082    1.16413  -2.878  0.00743 ** 
#disp        -0.01773    0.00919  -1.929  0.06362 .  
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Residual standard error: 2.917 on 29 degrees of freedom
#Multiple R-squared:  0.7809,    Adjusted R-squared:  0.7658

Then, to get the result of a quadratic model, we added I(disp^2). The new model appears better when looking at R^2 and all variables are significant.

fit1 = lm(mpg ~ wt+disp+I(disp^2), mtcars)
summary(fit1)

# Coefficients:
#              Estimate Std. Error t value Pr(>|t|)    
#(Intercept) 41.4019837  2.4266906  17.061  2.5e-16 ***
#wt          -3.4179165  0.9545642  -3.581 0.001278 ** 
#disp        -0.0823950  0.0182460  -4.516 0.000104 ***
#I(disp^2)    0.0001277  0.0000328   3.892 0.000561 ***
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Residual standard error: 2.391 on 28 degrees of freedom
#Multiple R-squared:  0.8578,    Adjusted R-squared:  0.8426 

As we have three variables, the fitted model is a surface represented by:

mpg = 41.4020-3.4179*wt-0.0824*disp+0.0001277*disp^2

Another way to specify polynomial regression is using poly with parameter raw=TRUE, otherwise orthogonal polynomials will be considered (see the help(ploy) for more information). We get the same result using:

summary(lm(mpg ~ wt+poly(disp, 2, raw=TRUE),mtcars))

Finally, what if we need to show a plot of the estimated surface? Well there are many options to make 3D plots in R. Here we use Fit3d from p3dpackage.

library(p3d)
Init3d(family="serif", cex = 1)
Plot3d(mpg ~ disp+wt, mtcars)
Axes3d()
Fit3d(fit1)

enter image description here