Sometimes when working with linear regression we need to check for non-linearity in the data. One way to do this is to fit a polynomial model and check whether it fits the data better than a linear model. There are other reasons, such as theoretical, that indicate to fit a quadratic or higher order model because it is believed that the variables relationship is inherently polynomial in nature.
Let's fit a quadratic model for the
mtcars dataset. For a linear model see Linear regression on the mtcars dataset.
First we make a scatter plot of the variables
disp (Displacement (cu.in.)), and
wt (Weight (1000 lbs)). The relationship among
disp appears non-linear.
A linear fit will show that
disp is not significant.
fit0 = lm(mpg ~ wt+disp, mtcars) summary(fit0) # Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) 34.96055 2.16454 16.151 4.91e-16 *** #wt -3.35082 1.16413 -2.878 0.00743 ** #disp -0.01773 0.00919 -1.929 0.06362 . #--- #Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #Residual standard error: 2.917 on 29 degrees of freedom #Multiple R-squared: 0.7809, Adjusted R-squared: 0.7658
Then, to get the result of a quadratic model, we added
I(disp^2). The new model appears better when looking at
R^2 and all variables are significant.
fit1 = lm(mpg ~ wt+disp+I(disp^2), mtcars) summary(fit1) # Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) 41.4019837 2.4266906 17.061 2.5e-16 *** #wt -3.4179165 0.9545642 -3.581 0.001278 ** #disp -0.0823950 0.0182460 -4.516 0.000104 *** #I(disp^2) 0.0001277 0.0000328 3.892 0.000561 *** #--- #Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #Residual standard error: 2.391 on 28 degrees of freedom #Multiple R-squared: 0.8578, Adjusted R-squared: 0.8426
As we have three variables, the fitted model is a surface represented by:
mpg = 41.4020-3.4179*wt-0.0824*disp+0.0001277*disp^2
Another way to specify polynomial regression is using
poly with parameter
raw=TRUE, otherwise orthogonal polynomials will be considered (see the
help(ploy) for more information). We get the same result using:
summary(lm(mpg ~ wt+poly(disp, 2, raw=TRUE),mtcars))
Finally, what if we need to show a plot of the estimated surface? Well there are many options to make
3D plots in
R. Here we use
library(p3d) Init3d(family="serif", cex = 1) Plot3d(mpg ~ disp+wt, mtcars) Axes3d() Fit3d(fit1)