Factors are one way to represent categorical variables in R. A factor is stored internally as a vector of integers. The unique elements of the supplied character vector are known as the levels of the factor. By default, if the levels are not supplied by the user, then R will generate the set of unique values in the vector, sort these values alphanumerically, and use them as the levels.
charvar <- rep(c("n", "c"), each = 3)
f <- factor(charvar)
f
levels(f)
> f
[1] n n n c c c
Levels: c n
> levels(f)
[1] "c" "n"
If you want to change the ordering of the levels, then one option to to specify the levels manually:
levels(factor(charvar, levels = c("n","c")))
> levels(factor(charvar, levels = c("n","c")))
[1] "n" "c"
Factors have a number of properties. For example, levels can be given labels:
> f <- factor(charvar, levels=c("n", "c"), labels=c("Newt", "Capybara"))
> f
[1] Newt Newt Newt Capybara Capybara Capybara
Levels: Newt Capybara
Another property that can be assigned is whether the factor is ordered:
> Weekdays <- factor(c("Monday", "Wednesday", "Thursday", "Tuesday", "Friday", "Sunday", "Saturday"))
> Weekdays
[1] Monday Wednesday Thursday Tuesday Friday Sunday Saturday
Levels: Friday Monday Saturday Sunday Thursday Tuesday Wednesday
> Weekdays <- factor(Weekdays, levels=c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"), ordered=TRUE)
> Weekdays
[1] Monday Wednesday Thursday Tuesday Friday Sunday Saturday
Levels: Monday < Tuesday < Wednesday < Thursday < Friday < Saturday < Sunday
When a level of the factor is no longer used, you can drop it using the droplevels()
function:
> Weekend <- subset(Weekdays, Weekdays == "Saturday" | Weekdays == "Sunday")
> Weekend
[1] Sunday Saturday
Levels: Monday < Tuesday < Wednesday < Thursday < Friday < Saturday < Sunday
> Weekend <- droplevels(Weekend)
> Weekend
[1] Sunday Saturday
Levels: Saturday < Sunday