# R Language Factors Changing and reordering factors

## Example

When factors are created with defaults, `levels` are formed by `as.character` applied to the inputs and are ordered alphabetically.

``````charvar <- rep(c("W", "n", "c"), times=c(17,20,14))
f <- factor(charvar)
levels(f)
#  "c" "n" "W"
``````

In some situations the treatment of the default ordering of `levels` (alphabetic/lexical order) will be acceptable. For example, if one justs want to `plot` the frequencies, this will be the result:

``````plot(f,col=1:length(levels(f)))
`````` But if we want a different ordering of `levels`, we need to specify this in the `levels` or `labels` parameter (taking care that the meaning of "order" here is different from ordered factors, see below). There are many alternatives to accomplish that task depending on the situation.

1. Redefine the factor

When it is possible, we can recreate the factor using the `levels` parameter with the order we want.

``````ff <- factor(charvar, levels = c("n", "W", "c"))
levels(ff)
#  "n" "W" "c"

gg <- factor(charvar, levels = c("W", "c", "n"))
levels(gg)
#  "W" "c" "n"
``````

When the input levels are different than the desired output levels, we use the `labels` parameter which causes the `levels` parameter to become a "filter" for acceptable input values, but leaves the final values of "levels" for the factor vector as the argument to `labels`:

``````fm <- factor(as.numeric(f),levels = c(2,3,1),
labels = c("nn", "WW", "cc"))
levels(fm)
#  "nn" "WW" "cc"

fm <- factor(LETTERS[1:6], levels = LETTERS[1:4],  # only 'A'-'D' as input
labels = letters[1:4])            # but assigned to 'a'-'d'
fm
# a    b    c    d    <NA> <NA>
#Levels: a b c d
``````

2. Use `relevel` function

When there is one specific `level` that needs to be the first we can use `relevel`. This happens, for example, in the context of statistical analysis, when a `base` category is necessary for testing hypothesis.

``````g<-relevel(f, "n") # moves n to be the first level
levels(g)
#  "n" "c" "W"
``````

As can be verified `f` and `g` are the same

``````all.equal(f, g)
#  "Attributes: < Component “levels”: 2 string mismatches >"
all.equal(f, g, check.attributes = F)
#  TRUE
``````

3. Reordering factors

There are cases when we need to `reorder` the `levels` based on a number, a partial result, a computed statistic, or previous calculations. Let's reorder based on the frequencies of the `levels`

``````table(g)
# g
#  n  c  W
# 20 14 17
``````

The `reorder` function is generic (see `help(reorder)`), but in this context needs: `x`, in this case the factor; `X`, a numeric value of the same length as `x`; and `FUN`, a function to be applied to `X` and computed by level of the `x`, which determines the `levels` order, by default increasing. The result is the same factor with its levels reordered.

``````g.ord <- reorder(g,rep(1,length(g)), FUN=sum) #increasing
levels(g.ord)
#  "c" "W" "n"
``````

To get de decreasing order we consider negative values (`-1`)

``````g.ord.d <- reorder(g,rep(-1,length(g)), FUN=sum)
levels(g.ord.d)
#  "n" "W" "c"
``````

Again the factor is the same as the others.

``````data.frame(f,g,g.ord,g.ord.d)[seq(1,length(g),by=5),] #just same lines
#    f g g.ord g.ord.d
# 1  W W     W       W
# 6  W W     W       W
# 11 W W     W       W
# 16 W W     W       W
# 21 n n     n       n
# 26 n n     n       n
# 31 n n     n       n
# 36 n n     n       n
# 41 c c     c       c
# 46 c c     c       c
# 51 c c     c       c
``````

When there is a quantitative variable related to the factor variable, we could use other functions to reorder the `levels`. Lets take the `iris` data (`help("iris")` for more information), for reordering the `Species` factor by using its mean `Sepal.Width`.

``````miris <- iris  #help("iris") # copy the data
with(miris, tapply(Sepal.Width,Species,mean))
#    setosa versicolor  virginica
#     3.428      2.770      2.974

miris\$Species.o<-with(miris,reorder(Species,-Sepal.Width))
levels(miris\$Species.o)
#  "setosa"     "virginica"  "versicolor"
``````

The usual `boxplot` (say: `with(miris, boxplot(Petal.Width~Species)`) will show the especies in this order: setosa, versicolor, and virginica. But using the ordered factor we get the species ordered by its mean `Sepal.Width`:

``````boxplot(Petal.Width~Species.o, data = miris,
xlab = "Species", ylab = "Petal Width",
main = "Iris Data, ordered by mean sepal width", varwidth = TRUE,
col = 2:4)
`````` Additionally, it is also possible to change the names of `levels`, combine them into groups, or add new `levels`. For that we use the function of the same name `levels`.

``````f1<-f
levels(f1)
#  "c" "n" "W"
levels(f1) <- c("upper","upper","CAP") #rename and grouping
levels(f1)
#  "upper" "CAP"

f2<-f1
levels(f2) <- c("upper","CAP", "Number") #add Number level, which is empty
levels(f2)
#  "upper"  "CAP"    "Number"
f2[length(f2):(length(f2)+5)]<-"Number" # add cases for the new level
table(f2)
# f2
#  upper    CAP Number
#     33     17      6

f3<-f1
levels(f3) <- list(G1 = "upper", G2 = "CAP", G3 = "Number") # The same using list
levels(f3)
#  "G1" "G2" "G3"
f3[length(f3):(length(f3)+6)]<-"G3" ## add cases for the new level
table(f3)
# f3
# G1 G2 G3
# 33 17  7
``````

- Ordered factors

Finally, we know that `ordered` factors are different from `factors`, the first one are used to represent ordinal data, and the second one to work with nominal data. At first, it does not make sense to change the order of `levels` for ordered factors, but we can change its `labels`.

``````ordvar<-rep(c("Low", "Medium", "High"), times=c(7,2,4))

of<-ordered(ordvar,levels=c("Low", "Medium", "High"))
levels(of)
#  "Low"    "Medium" "High"

of1<-of
levels(of1)<- c("LOW", "MEDIUM", "HIGH")
levels(of1)
#  "LOW"    "MEDIUM" "HIGH"
is.ordered(of1)
#  TRUE
of1
#  LOW    LOW    LOW    LOW    LOW    LOW    LOW    MEDIUM MEDIUM HIGH   HIGH   HIGH   HIGH
# Levels: LOW < MEDIUM < HIGH
`````` PDF - Download R Language for free