# example data
DT = data.table(iris)
DT[, Bin := cut(Sepal.Length, c(4,6,8))]
.N
.N
in j
stores the number of rows in a subset. When exploring data, .N
is handy to...
count rows in a group,
DT[Species == "setosa", .N]
# 50
or count rows in all groups,
DT[, .N, by=.(Species, Bin)]
# Species Bin N
# 1: setosa (4,6] 50
# 2: versicolor (6,8] 20
# 3: versicolor (4,6] 30
# 4: virginica (6,8] 41
# 5: virginica (4,6] 9
or find groups that have a certain number of rows.
DT[, .N, by=.(Species, Bin)][ N < 25 ]
# Species Bin N
# 1: versicolor (6,8] 20
# 2: virginica (4,6] 9
However, we are missing groups with a count of zero above. If they matter, we can use table
from base:
DT[, data.table(table(Species, Bin))][ N < 25 ]
# Species Bin N
# 1: virginica (4,6] 9
# 2: setosa (6,8] 0
# 3: versicolor (6,8] 20
Alternately, we can join on all groups:
DT[CJ(Species=Species, Bin=Bin, unique=TRUE), on=c("Species","Bin"), .N, by=.EACHI][N < 25]
# Species Bin N
# 1: setosa (6,8] 0
# 2: versicolor (6,8] 20
# 3: virginica (4,6] 9
A note on .N
:
.N
in j
, where it refers to size of a subset.i
, it refers to the total number of rows.