**Factors** are one method to represent categorical variables in R. Given a vector `x`

whose values can be converted to characters using `as.character()`

, the default arguments for `factor()`

and `as.factor()`

assign an integer to each distinct element of the vector as well as a level attribute and a label attribute. Levels are the values `x`

can possibly take and labels can either be the given element or determined by the user.

To example how factors work we will create a factor with default attributes, then custom levels, and then custom levels and labels.

```
# standard
factor(c(1,1,2,2,3,3))
[1] 1 1 2 2 3 3
Levels: 1 2 3
```

Instances can arise where the user knows the number of possible values a factor can take on is greater than the current values in the vector. For this we assign the levels ourselves in `factor()`

.

```
factor(c(1,1,2,2,3,3),
levels = c(1,2,3,4,5))
[1] 1 1 2 2 3 3
Levels: 1 2 3 4 5
```

For style purposes the user may wish to assign labels to each level. By default, labels are the character representation of the levels. Here we assign labels for each of the possible levels in the factor.

```
factor(c(1,1,2,2,3,3),
levels = c(1,2,3,4,5),
labels = c("Fox","Dog","Cow","Brick","Dolphin"))
[1] Fox Fox Dog Dog Cow Cow
Levels: Fox Dog Cow Brick Dolphin
```

Normally, factors can only be compared using `==`

and `!=`

and if the factors have the same levels. The following comparison of factors fails even though they appear equal because the factors have different factor levels.

```
factor(c(1,1,2,2,3,3),levels = c(1,2,3)) == factor(c(1,1,2,2,3,3),levels = c(1,2,3,4,5))
Error in Ops.factor(factor(c(1, 1, 2, 2, 3, 3), levels = c(1, 2, 3)), :
level sets of factors are different
```

This makes sense as the extra levels in the RHS mean that R does not have enough information about each factor to compare them in a meaningful way.

The operators `<`

, `<=`

, `>`

and `>=`

are only usable for ordered factors. These can represent categorical values which still have a linear order. An ordered factor can be created by providing the `ordered = TRUE`

argument to the `factor`

function or just using the `ordered`

function.

```
x <- factor(1:3, labels = c('low', 'medium', 'high'), ordered = TRUE)
print(x)
[1] low medium high
Levels: low < medium < high
y <- ordered(3:1, labels = c('low', 'medium', 'high'))
print(y)
[1] high medium low
Levels: low < medium < high
x < y
[1] TRUE FALSE FALSE
```

For more information, see the Factor documentation.

This modified text is an extract of the original Stack Overflow Documentation created by following contributors and released under CC BY-SA 3.0

This website is not affiliated with Stack Overflow