When factors are created with defaults, `levels`

are formed by `as.character`

applied to the inputs and are ordered alphabetically.

```
charvar <- rep(c("W", "n", "c"), times=c(17,20,14))
f <- factor(charvar)
levels(f)
# [1] "c" "n" "W"
```

In some situations the treatment of the default ordering of `levels`

(alphabetic/lexical order) will be acceptable. For example, if one justs want to `plot`

the frequencies, this will be the result:

```
plot(f,col=1:length(levels(f)))
```

But if we want a different ordering of `levels`

, we need to specify this in the `levels`

or `labels`

parameter (taking care that the meaning of "order" here is different from *ordered* factors, see below).
There are many alternatives to accomplish that task depending on the situation.

**1. Redefine the factor**

When it is possible, we can recreate the factor using the `levels`

parameter with the order we want.

```
ff <- factor(charvar, levels = c("n", "W", "c"))
levels(ff)
# [1] "n" "W" "c"
gg <- factor(charvar, levels = c("W", "c", "n"))
levels(gg)
# [1] "W" "c" "n"
```

When the input levels are different than the desired output levels, we use the `labels`

parameter which causes the `levels`

parameter to become a "filter" for acceptable input values, but leaves the final values of "levels" for the factor vector as the argument to `labels`

:

```
fm <- factor(as.numeric(f),levels = c(2,3,1),
labels = c("nn", "WW", "cc"))
levels(fm)
# [1] "nn" "WW" "cc"
fm <- factor(LETTERS[1:6], levels = LETTERS[1:4], # only 'A'-'D' as input
labels = letters[1:4]) # but assigned to 'a'-'d'
fm
#[1] a b c d <NA> <NA>
#Levels: a b c d
```

**2. Use relevel function**

When there is one specific `level`

that needs to be the first we can use `relevel`

. This happens, for example, in the context of statistical analysis, when a `base`

category is necessary for testing hypothesis.

```
g<-relevel(f, "n") # moves n to be the first level
levels(g)
# [1] "n" "c" "W"
```

As can be verified `f`

and `g`

are the same

```
all.equal(f, g)
# [1] "Attributes: < Component “levels”: 2 string mismatches >"
all.equal(f, g, check.attributes = F)
# [1] TRUE
```

**3. Reordering factors**

There are cases when we need to `reorder`

the `levels`

based on a number, a partial result, a computed statistic, or previous calculations. Let's reorder based on the **frequencies** of the `levels`

```
table(g)
# g
# n c W
# 20 14 17
```

The `reorder`

function is generic (see `help(reorder)`

), but in this context needs: `x`

, in this case the factor; `X`

, a numeric value of the same length as `x`

; and `FUN`

, a function to be applied to `X`

and computed by level of the `x`

, which determines the `levels`

order, by default increasing. The result is the same factor with its levels reordered.

```
g.ord <- reorder(g,rep(1,length(g)), FUN=sum) #increasing
levels(g.ord)
# [1] "c" "W" "n"
```

To get de decreasing order we consider negative values (`-1`

)

```
g.ord.d <- reorder(g,rep(-1,length(g)), FUN=sum)
levels(g.ord.d)
# [1] "n" "W" "c"
```

Again the factor is the same as the others.

```
data.frame(f,g,g.ord,g.ord.d)[seq(1,length(g),by=5),] #just same lines
# f g g.ord g.ord.d
# 1 W W W W
# 6 W W W W
# 11 W W W W
# 16 W W W W
# 21 n n n n
# 26 n n n n
# 31 n n n n
# 36 n n n n
# 41 c c c c
# 46 c c c c
# 51 c c c c
```

When there is a **quantitative variable** related to the factor variable, we could use other functions to reorder the `levels`

. Lets take the `iris`

data (`help("iris")`

for more information), for reordering the `Species`

factor by using its mean `Sepal.Width`

.

```
miris <- iris #help("iris") # copy the data
with(miris, tapply(Sepal.Width,Species,mean))
# setosa versicolor virginica
# 3.428 2.770 2.974
miris$Species.o<-with(miris,reorder(Species,-Sepal.Width))
levels(miris$Species.o)
# [1] "setosa" "virginica" "versicolor"
```

The usual `boxplot`

(say: `with(miris, boxplot(Petal.Width~Species)`

) will show the especies in this order: *setosa*, *versicolor*, and *virginica*. But using the ordered factor we get the species ordered by its mean `Sepal.Width`

:

```
boxplot(Petal.Width~Species.o, data = miris,
xlab = "Species", ylab = "Petal Width",
main = "Iris Data, ordered by mean sepal width", varwidth = TRUE,
col = 2:4)
```

Additionally, it is also possible to change the names of `levels`

, combine them into groups, or add new `levels`

. For that we use the function of the same name `levels`

.

```
f1<-f
levels(f1)
# [1] "c" "n" "W"
levels(f1) <- c("upper","upper","CAP") #rename and grouping
levels(f1)
# [1] "upper" "CAP"
f2<-f1
levels(f2) <- c("upper","CAP", "Number") #add Number level, which is empty
levels(f2)
# [1] "upper" "CAP" "Number"
f2[length(f2):(length(f2)+5)]<-"Number" # add cases for the new level
table(f2)
# f2
# upper CAP Number
# 33 17 6
f3<-f1
levels(f3) <- list(G1 = "upper", G2 = "CAP", G3 = "Number") # The same using list
levels(f3)
# [1] "G1" "G2" "G3"
f3[length(f3):(length(f3)+6)]<-"G3" ## add cases for the new level
table(f3)
# f3
# G1 G2 G3
# 33 17 7
```

**- Ordered factors**

Finally, we know that `ordered`

factors are different from `factors`

, the first one are used to represent *ordinal data*, and the second one to work with *nominal data*. At first, it does not make sense to change the order of `levels`

for ordered factors, but we can change its `labels`

.

```
ordvar<-rep(c("Low", "Medium", "High"), times=c(7,2,4))
of<-ordered(ordvar,levels=c("Low", "Medium", "High"))
levels(of)
# [1] "Low" "Medium" "High"
of1<-of
levels(of1)<- c("LOW", "MEDIUM", "HIGH")
levels(of1)
# [1] "LOW" "MEDIUM" "HIGH"
is.ordered(of1)
# [1] TRUE
of1
# [1] LOW LOW LOW LOW LOW LOW LOW MEDIUM MEDIUM HIGH HIGH HIGH HIGH
# Levels: LOW < MEDIUM < HIGH
```

This modified text is an extract of the original Stack Overflow Documentation created by following contributors and released under CC BY-SA 3.0

This website is not affiliated with Stack Overflow