Factors are one method to represent categorical variables in R. Given a vector
x whose values can be converted to characters using
as.character(), the default arguments for
as.factor() assign an integer to each distinct element of the vector as well as a level attribute and a label attribute. Levels are the values
x can possibly take and labels can either be the given element or determined by the user.
To example how factors work we will create a factor with default attributes, then custom levels, and then custom levels and labels.
# standard factor(c(1,1,2,2,3,3))  1 1 2 2 3 3 Levels: 1 2 3
Instances can arise where the user knows the number of possible values a factor can take on is greater than the current values in the vector. For this we assign the levels ourselves in
factor(c(1,1,2,2,3,3), levels = c(1,2,3,4,5))  1 1 2 2 3 3 Levels: 1 2 3 4 5
For style purposes the user may wish to assign labels to each level. By default, labels are the character representation of the levels. Here we assign labels for each of the possible levels in the factor.
factor(c(1,1,2,2,3,3), levels = c(1,2,3,4,5), labels = c("Fox","Dog","Cow","Brick","Dolphin"))  Fox Fox Dog Dog Cow Cow Levels: Fox Dog Cow Brick Dolphin
Normally, factors can only be compared using
!= and if the factors have the same levels. The following comparison of factors fails even though they appear equal because the factors have different factor levels.
factor(c(1,1,2,2,3,3),levels = c(1,2,3)) == factor(c(1,1,2,2,3,3),levels = c(1,2,3,4,5)) Error in Ops.factor(factor(c(1, 1, 2, 2, 3, 3), levels = c(1, 2, 3)), : level sets of factors are different
This makes sense as the extra levels in the RHS mean that R does not have enough information about each factor to compare them in a meaningful way.
>= are only usable for ordered factors. These can represent categorical values which still have a linear order. An ordered factor can be created by providing the
ordered = TRUE argument to the
factor function or just using the
x <- factor(1:3, labels = c('low', 'medium', 'high'), ordered = TRUE) print(x)  low medium high Levels: low < medium < high y <- ordered(3:1, labels = c('low', 'medium', 'high')) print(y)  high medium low Levels: low < medium < high x < y  TRUE FALSE FALSE
For more information, see the Factor documentation.