R Language Run-length encoding Identifying and grouping by runs in base R

Example

One might want to group their data by the runs of a variable and perform some sort of analysis. Consider the following simple dataset:

``````(dat <- data.frame(x = c(1, 1, 2, 2, 2, 1), y = 1:6))
#   x y
# 1 1 1
# 2 1 2
# 3 2 3
# 4 2 4
# 5 2 5
# 6 1 6
``````

The variable `x` has three runs: a run of length 2 with value 1, a run of length 3 with value 2, and a run of length 1 with value 1. We might want to compute the mean value of variable `y` in each of the runs of variable `x` (these mean values are 1.5, 4, and 6).

In base R, we would first compute the run-length encoding of the `x` variable using `rle`:

``````(r <- rle(dat\$x))
# Run Length Encoding
#   lengths: int [1:3] 2 3 1
#   values : num [1:3] 1 2 1
``````

The next step is to compute the run number of each row of our dataset. We know that the total number of runs is `length(r\$lengths)`, and the length of each run is `r\$lengths`, so we can compute the run number of each of our runs with `rep`:

``````(run.id <- rep(seq_along(r\$lengths), r\$lengths))
# [1] 1 1 2 2 2 3
``````

Now we can use `tapply` to compute the mean `y` value for each run by grouping on the run id:

``````data.frame(x=r\$values, meanY=tapply(dat\$y, run.id, mean))
#   x meanY
# 1 1   1.5
# 2 2   4.0
# 3 1   6.0
``````