When factors are created with defaults, levels
are formed by as.character
applied to the inputs and are ordered alphabetically.
charvar <- rep(c("W", "n", "c"), times=c(17,20,14))
f <- factor(charvar)
levels(f)
# [1] "c" "n" "W"
In some situations the treatment of the default ordering of levels
(alphabetic/lexical order) will be acceptable. For example, if one justs want to plot
the frequencies, this will be the result:
plot(f,col=1:length(levels(f)))
But if we want a different ordering of levels
, we need to specify this in the levels
or labels
parameter (taking care that the meaning of "order" here is different from ordered factors, see below).
There are many alternatives to accomplish that task depending on the situation.
1. Redefine the factor
When it is possible, we can recreate the factor using the levels
parameter with the order we want.
ff <- factor(charvar, levels = c("n", "W", "c"))
levels(ff)
# [1] "n" "W" "c"
gg <- factor(charvar, levels = c("W", "c", "n"))
levels(gg)
# [1] "W" "c" "n"
When the input levels are different than the desired output levels, we use the labels
parameter which causes the levels
parameter to become a "filter" for acceptable input values, but leaves the final values of "levels" for the factor vector as the argument to labels
:
fm <- factor(as.numeric(f),levels = c(2,3,1),
labels = c("nn", "WW", "cc"))
levels(fm)
# [1] "nn" "WW" "cc"
fm <- factor(LETTERS[1:6], levels = LETTERS[1:4], # only 'A'-'D' as input
labels = letters[1:4]) # but assigned to 'a'-'d'
fm
#[1] a b c d <NA> <NA>
#Levels: a b c d
2. Use relevel
function
When there is one specific level
that needs to be the first we can use relevel
. This happens, for example, in the context of statistical analysis, when a base
category is necessary for testing hypothesis.
g<-relevel(f, "n") # moves n to be the first level
levels(g)
# [1] "n" "c" "W"
As can be verified f
and g
are the same
all.equal(f, g)
# [1] "Attributes: < Component “levels”: 2 string mismatches >"
all.equal(f, g, check.attributes = F)
# [1] TRUE
3. Reordering factors
There are cases when we need to reorder
the levels
based on a number, a partial result, a computed statistic, or previous calculations. Let's reorder based on the frequencies of the levels
table(g)
# g
# n c W
# 20 14 17
The reorder
function is generic (see help(reorder)
), but in this context needs: x
, in this case the factor; X
, a numeric value of the same length as x
; and FUN
, a function to be applied to X
and computed by level of the x
, which determines the levels
order, by default increasing. The result is the same factor with its levels reordered.
g.ord <- reorder(g,rep(1,length(g)), FUN=sum) #increasing
levels(g.ord)
# [1] "c" "W" "n"
To get de decreasing order we consider negative values (-1
)
g.ord.d <- reorder(g,rep(-1,length(g)), FUN=sum)
levels(g.ord.d)
# [1] "n" "W" "c"
Again the factor is the same as the others.
data.frame(f,g,g.ord,g.ord.d)[seq(1,length(g),by=5),] #just same lines
# f g g.ord g.ord.d
# 1 W W W W
# 6 W W W W
# 11 W W W W
# 16 W W W W
# 21 n n n n
# 26 n n n n
# 31 n n n n
# 36 n n n n
# 41 c c c c
# 46 c c c c
# 51 c c c c
When there is a quantitative variable related to the factor variable, we could use other functions to reorder the levels
. Lets take the iris
data (help("iris")
for more information), for reordering the Species
factor by using its mean Sepal.Width
.
miris <- iris #help("iris") # copy the data
with(miris, tapply(Sepal.Width,Species,mean))
# setosa versicolor virginica
# 3.428 2.770 2.974
miris$Species.o<-with(miris,reorder(Species,-Sepal.Width))
levels(miris$Species.o)
# [1] "setosa" "virginica" "versicolor"
The usual boxplot
(say: with(miris, boxplot(Petal.Width~Species)
) will show the especies in this order: setosa, versicolor, and virginica. But using the ordered factor we get the species ordered by its mean Sepal.Width
:
boxplot(Petal.Width~Species.o, data = miris,
xlab = "Species", ylab = "Petal Width",
main = "Iris Data, ordered by mean sepal width", varwidth = TRUE,
col = 2:4)
Additionally, it is also possible to change the names of levels
, combine them into groups, or add new levels
. For that we use the function of the same name levels
.
f1<-f
levels(f1)
# [1] "c" "n" "W"
levels(f1) <- c("upper","upper","CAP") #rename and grouping
levels(f1)
# [1] "upper" "CAP"
f2<-f1
levels(f2) <- c("upper","CAP", "Number") #add Number level, which is empty
levels(f2)
# [1] "upper" "CAP" "Number"
f2[length(f2):(length(f2)+5)]<-"Number" # add cases for the new level
table(f2)
# f2
# upper CAP Number
# 33 17 6
f3<-f1
levels(f3) <- list(G1 = "upper", G2 = "CAP", G3 = "Number") # The same using list
levels(f3)
# [1] "G1" "G2" "G3"
f3[length(f3):(length(f3)+6)]<-"G3" ## add cases for the new level
table(f3)
# f3
# G1 G2 G3
# 33 17 7
- Ordered factors
Finally, we know that ordered
factors are different from factors
, the first one are used to represent ordinal data, and the second one to work with nominal data. At first, it does not make sense to change the order of levels
for ordered factors, but we can change its labels
.
ordvar<-rep(c("Low", "Medium", "High"), times=c(7,2,4))
of<-ordered(ordvar,levels=c("Low", "Medium", "High"))
levels(of)
# [1] "Low" "Medium" "High"
of1<-of
levels(of1)<- c("LOW", "MEDIUM", "HIGH")
levels(of1)
# [1] "LOW" "MEDIUM" "HIGH"
is.ordered(of1)
# [1] TRUE
of1
# [1] LOW LOW LOW LOW LOW LOW LOW MEDIUM MEDIUM HIGH HIGH HIGH HIGH
# Levels: LOW < MEDIUM < HIGH