R Language Rebuilding factors from zero


Example

Problem

Factors are used to represent variables that take values from a set of categories, known as Levels in R. For example, some experiment could be characterized by the energy level of a battery, with four levels: empty, low, normal, and full. Then, for 5 different sampling sites, those levels could be identified, in those terms, as follows:

full, full, normal, empty, low

Typically, in databases or other information sources, the handling of these data is by arbitrary integer indices associated with the categories or levels. If we assume that, for the given example, we would assign, the indices as follows: 1 = empty, 2 = low, 3 = normal, 4 = full, then the 5 samples could be coded as:

4, 4, 3, 1, 2

It could happen that, from your source of information, e.g. a database, you only have the encoded list of integers, and the catalog associating each integer with each level-keyword. How can a factor of R be reconstructed from that information?

Solution

We will simulate a vector of 20 integers that represents the samples, each of which may have one of four different values:

set.seed(18)
ii <- sample(1:4, 20, replace=T)
ii

[1] 4 3 4 1 1 3 2 3 2 1 3 4 1 2 4 1 3 1 4 1

The first step is to make a factor, from the previous sequence, in which the levels or categories are exactly the numbers from 1 to 4.

fii <- factor(ii, levels=1:4) # it is necessary to indicate the numeric levels
fii

[1] 4 3 4 1 1 3 2 3 2 1 3 4 1 2 4 1 3 1 4 1
Levels: 1 2 3 4

Now simply, you have to dress the factor already created with the index tags:

levels(fii) <- c("empty", "low", "normal", "full")
fii

[1] full normal full empty empty normal low normal low empty
[11] normal full empty low full empty normal empty full empty
Levels: empty low normal full