One might want to group their data by the runs of a variable and perform some sort of analysis. Consider the following simple dataset:
(dat <- data.frame(x = c(1, 1, 2, 2, 2, 1), y = 1:6))
# x y
# 1 1 1
# 2 1 2
# 3 2 3
# 4 2 4
# 5 2 5
# 6 1 6
The variable x
has three runs: a run of length 2 with value 1, a run of length 3 with value 2, and a run of length 1 with value 1. We might want to compute the mean value of variable y
in each of the runs of variable x
(these mean values are 1.5, 4, and 6).
In base R, we would first compute the run-length encoding of the x
variable using rle
:
(r <- rle(dat$x))
# Run Length Encoding
# lengths: int [1:3] 2 3 1
# values : num [1:3] 1 2 1
The next step is to compute the run number of each row of our dataset. We know that the total number of runs is length(r$lengths)
, and the length of each run is r$lengths
, so we can compute the run number of each of our runs with rep
:
(run.id <- rep(seq_along(r$lengths), r$lengths))
# [1] 1 1 2 2 2 3
Now we can use tapply
to compute the mean y
value for each run by grouping on the run id:
data.frame(x=r$values, meanY=tapply(dat$y, run.id, mean))
# x meanY
# 1 1 1.5
# 2 2 4.0
# 3 1 6.0