# R Language Run-length encoding to compress and decompress vectors

## Example

Long vectors with long runs of the same value can be significantly compressed by storing them in their run-length encoding (the value of each run and the number of times that value is repeated). As an example, consider a vector of length 10 million with a huge number of 1's and only a small number of 0's:

``````set.seed(144)
dat <- sample(rep(0:1, c(1, 1e5)), 1e7, replace=TRUE)
table(dat)
#       0       1
#     103 9999897
``````

Storing 10 million entries will require significant space, but we can instead create a data frame with the run-length encoding of this vector:

``````rle.df <- with(rle(dat), data.frame(values, lengths))
dim(rle.df)
#  207   2
#   values lengths
# 1      1   52818
# 2      0       1
# 3      1  219329
# 4      0       1
# 5      1  318306
# 6      0       1
``````

From the run-length encoding, we see that the first 52,818 values in the vector are 1's, followed by a single 0, followed by 219,329 consecutive 1's, followed by a 0, and so on. The run-length encoding only has 207 entries, requiring us to store only 414 values instead of 10 million values. As `rle.df` is a data frame, it can be stored using standard functions like `write.csv`.

Decompressing a vector in run-length encoding can be accomplished in two ways. The first method is to simply call `rep`, passing the `values` element of the run-length encoding as the first argument and the `lengths` element of the run-length encoding as the second argument:

``````decompressed <- rep(rle.df\$values, rle.df\$lengths)
``````

We can confirm that our decompressed data is identical to our original data:

``````identical(decompressed, dat)
#  TRUE
``````

The second method is to use R's built-in `inverse.rle` function on the `rle` object, for instance:

``````rle.obj <- rle(dat)                            # create a rle object here
class(rle.obj)
#  "rle"

dat.inv <- inverse.rle(rle.obj)               # apply the inverse.rle on the rle object
``````

We can confirm again that this produces exactly the original `dat`:

``````identical(dat.inv, dat)
#  TRUE
`````` PDF - Download R Language for free