data.table Computing summary statistics Counting rows by group


Example

# example data
DT = data.table(iris)
DT[, Bin := cut(Sepal.Length, c(4,6,8))]

Using .N

.N in j stores the number of rows in a subset. When exploring data, .N is handy to...

  1. count rows in a group,

    DT[Species == "setosa", .N]
    
    # 50
    
  2. or count rows in all groups,

    DT[, .N, by=.(Species, Bin)]
    
    #       Species   Bin  N
    # 1:     setosa (4,6] 50
    # 2: versicolor (6,8] 20
    # 3: versicolor (4,6] 30
    # 4:  virginica (6,8] 41
    # 5:  virginica (4,6]  9
    
  3. or find groups that have a certain number of rows.

    DT[, .N, by=.(Species, Bin)][ N < 25 ]
    
    #       Species   Bin  N
    # 1: versicolor (6,8] 20
    # 2:  virginica (4,6]  9
    

Handling missing groups

However, we are missing groups with a count of zero above. If they matter, we can use table from base:

DT[, data.table(table(Species, Bin))][ N < 25 ]

#       Species   Bin  N
# 1:  virginica (4,6]  9
# 2:     setosa (6,8]  0
# 3: versicolor (6,8] 20

Alternately, we can join on all groups:

DT[CJ(Species=Species, Bin=Bin, unique=TRUE), on=c("Species","Bin"), .N, by=.EACHI][N < 25]

#       Species   Bin  N
# 1:     setosa (6,8]  0
# 2: versicolor (6,8] 20
# 3:  virginica (4,6]  9

A note on .N:

  • This example uses .N in j, where it refers to size of a subset.
  • In i, it refers to the total number of rows.