R Language Factors Changing and reordering factors


Example

When factors are created with defaults, levels are formed by as.character applied to the inputs and are ordered alphabetically.

charvar <- rep(c("W", "n", "c"), times=c(17,20,14))
f <- factor(charvar)
levels(f)
# [1] "c" "n" "W"

In some situations the treatment of the default ordering of levels (alphabetic/lexical order) will be acceptable. For example, if one justs want to plot the frequencies, this will be the result:

plot(f,col=1:length(levels(f)))

enter image description here

But if we want a different ordering of levels, we need to specify this in the levels or labels parameter (taking care that the meaning of "order" here is different from ordered factors, see below). There are many alternatives to accomplish that task depending on the situation.

1. Redefine the factor

When it is possible, we can recreate the factor using the levels parameter with the order we want.

ff <- factor(charvar, levels = c("n", "W", "c"))
levels(ff)
# [1] "n" "W" "c"
 
gg <- factor(charvar, levels = c("W", "c", "n"))
levels(gg)
# [1] "W" "c" "n"

When the input levels are different than the desired output levels, we use the labels parameter which causes the levels parameter to become a "filter" for acceptable input values, but leaves the final values of "levels" for the factor vector as the argument to labels:

fm <- factor(as.numeric(f),levels = c(2,3,1),
             labels = c("nn", "WW", "cc"))
levels(fm)
# [1] "nn" "WW" "cc"

fm <- factor(LETTERS[1:6], levels = LETTERS[1:4],  # only 'A'-'D' as input
                 labels = letters[1:4])            # but assigned to 'a'-'d'
fm
#[1] a    b    c    d    <NA> <NA>
#Levels: a b c d

2. Use relevel function

When there is one specific level that needs to be the first we can use relevel. This happens, for example, in the context of statistical analysis, when a base category is necessary for testing hypothesis.

g<-relevel(f, "n") # moves n to be the first level
levels(g)
# [1] "n" "c" "W"  

As can be verified f and g are the same

all.equal(f, g)
# [1] "Attributes: < Component “levels”: 2 string mismatches >"
all.equal(f, g, check.attributes = F)
# [1] TRUE

3. Reordering factors

There are cases when we need to reorder the levels based on a number, a partial result, a computed statistic, or previous calculations. Let's reorder based on the frequencies of the levels

table(g)
# g
#  n  c  W 
# 20 14 17 

The reorder function is generic (see help(reorder)), but in this context needs: x, in this case the factor; X, a numeric value of the same length as x; and FUN, a function to be applied to X and computed by level of the x, which determines the levels order, by default increasing. The result is the same factor with its levels reordered.

g.ord <- reorder(g,rep(1,length(g)), FUN=sum) #increasing
levels(g.ord)
# [1] "c" "W" "n"

To get de decreasing order we consider negative values (-1)

g.ord.d <- reorder(g,rep(-1,length(g)), FUN=sum)
levels(g.ord.d)
# [1] "n" "W" "c"

Again the factor is the same as the others.

data.frame(f,g,g.ord,g.ord.d)[seq(1,length(g),by=5),] #just same lines
#    f g g.ord g.ord.d
# 1  W W     W       W
# 6  W W     W       W
# 11 W W     W       W
# 16 W W     W       W
# 21 n n     n       n
# 26 n n     n       n
# 31 n n     n       n
# 36 n n     n       n
# 41 c c     c       c
# 46 c c     c       c
# 51 c c     c       c

When there is a quantitative variable related to the factor variable, we could use other functions to reorder the levels. Lets take the iris data (help("iris") for more information), for reordering the Species factor by using its mean Sepal.Width.

miris <- iris  #help("iris") # copy the data
with(miris, tapply(Sepal.Width,Species,mean))
#    setosa versicolor  virginica 
#     3.428      2.770      2.974 

miris$Species.o<-with(miris,reorder(Species,-Sepal.Width))
levels(miris$Species.o)
# [1] "setosa"     "virginica"  "versicolor"

The usual boxplot (say: with(miris, boxplot(Petal.Width~Species)) will show the especies in this order: setosa, versicolor, and virginica. But using the ordered factor we get the species ordered by its mean Sepal.Width:

boxplot(Petal.Width~Species.o, data = miris,
        xlab = "Species", ylab = "Petal Width",
        main = "Iris Data, ordered by mean sepal width", varwidth = TRUE,
        col = 2:4) 

enter image description here

Additionally, it is also possible to change the names of levels, combine them into groups, or add new levels. For that we use the function of the same name levels.

f1<-f
levels(f1)
# [1] "c" "n" "W"
levels(f1) <- c("upper","upper","CAP") #rename and grouping
levels(f1)
# [1] "upper" "CAP" 

f2<-f1
levels(f2) <- c("upper","CAP", "Number") #add Number level, which is empty
levels(f2)
# [1] "upper"  "CAP"    "Number"
f2[length(f2):(length(f2)+5)]<-"Number" # add cases for the new level
table(f2)
# f2
#  upper    CAP Number 
#     33     17      6 

f3<-f1
levels(f3) <- list(G1 = "upper", G2 = "CAP", G3 = "Number") # The same using list
levels(f3)
# [1] "G1" "G2" "G3"
f3[length(f3):(length(f3)+6)]<-"G3" ## add cases for the new level
table(f3)
# f3
# G1 G2 G3 
# 33 17  7 

- Ordered factors

Finally, we know that ordered factors are different from factors, the first one are used to represent ordinal data, and the second one to work with nominal data. At first, it does not make sense to change the order of levels for ordered factors, but we can change its labels.

ordvar<-rep(c("Low", "Medium", "High"), times=c(7,2,4))

of<-ordered(ordvar,levels=c("Low", "Medium", "High"))
levels(of)
# [1] "Low"    "Medium" "High" 

of1<-of
levels(of1)<- c("LOW", "MEDIUM", "HIGH")
levels(of1)
# [1] "LOW"    "MEDIUM" "HIGH"
is.ordered(of1)
# [1] TRUE
of1
# [1] LOW    LOW    LOW    LOW    LOW    LOW    LOW    MEDIUM MEDIUM HIGH   HIGH   HIGH   HIGH  
# Levels: LOW < MEDIUM < HIGH