The data.table package provides a convenient way to group by runs in data. Consider the following example data:
library(data.table)
(DT <- data.table(x = c(1, 1, 2, 2, 2, 1), y = 1:6))
# x y
# 1: 1 1
# 2: 1 2
# 3: 2 3
# 4: 2 4
# 5: 2 5
# 6: 1 6
The variable x
has three runs: a run of length 2 with value 1, a run of length 3 with value 2, and a run of length 1 with value 1. We might want to compute the mean value of variable y
in each of the runs of variable x (these mean values are 1.5, 4, and 6).
The data.table rleid
function provides an id indicating the run id of each element of a vector:
rleid(DT$x)
# [1] 1 1 2 2 2 3
One can then easily group on this run ID and summarize the y
data:
DT[,mean(y),by=.(x, rleid(x))]
# x rleid V1
# 1: 1 1 1.5
# 2: 2 2 4.0
# 3: 1 3 6.0