R Language ANOVA Basic usage of aov()


Analysis of Variance (aov) is used to determine if the means of two or more groups differ significantly from each other. Responses are assumed to be independent of each other, Normally distributed (within each group), and the within-group variances are assumed equal.

In order to complete the analysis data must be in long format (see reshaping data topic). aov() is a wrapper around the lm() function, using Wilkinson-Rogers formula notation y~f where y is the response (independent) variable and f is a factor (categorical) variable representing group membership. If f is numeric rather than a factor variable, aov() will report the results of a linear regression in ANOVA format, which may surprise inexperienced users.

The aov() function uses Type I (sequential) Sum of Squares. This type of Sum of Squares tests all of the (main and interaction) effects sequentially. The result is that the first effect tested is also assigned shared variance between it and other effects in the model. For the results from such a model to be reliable, data should be balanced (all groups are of the same size).

When the assumptions for Type I Sum of Squares do not hold, Type II or Type III Sum of Squares may be applicable. Type II Sum of Squares test each main effect after every other main effect, and thus controls for any overlapping variance. However, Type II Sum of Squares assumes no interaction between the main effects.

Lastly, Type III Sum of Squares tests each main effect after every other main effect and every interaction. This makes Type III Sum of Squares a necessity when an interaction is present.

Type II and Type III Sums of Squares are implemented in the Anova() function.

Using the mtcars data set as an example.

mtCarsAnovaModel <- aov(wt ~ factor(cyl), data=mtcars)

To view summary of ANOVA model:


One can also extract the coefficients of the underlying lm() model: