data.table Computing summary statistics The summary function


Example

# example data
DT = data.table(iris)
DT[, Bin := cut(Sepal.Length, c(4,6,8))]

summary is handy for browsing summary statistics. Besides direct usage like summary(DT), it can also be applied per-group conveniently with split:

lapply(split(DT, by=c("Species", "Bin"), drop=TRUE, keep.by=FALSE), summary)

# $`setosa.(4,6]`
#   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
#  Min.   :4.300   Min.   :2.300   Min.   :1.000   Min.   :0.100  
#  1st Qu.:4.800   1st Qu.:3.200   1st Qu.:1.400   1st Qu.:0.200  
#  Median :5.000   Median :3.400   Median :1.500   Median :0.200  
#  Mean   :5.006   Mean   :3.428   Mean   :1.462   Mean   :0.246  
#  3rd Qu.:5.200   3rd Qu.:3.675   3rd Qu.:1.575   3rd Qu.:0.300  
#  Max.   :5.800   Max.   :4.400   Max.   :1.900   Max.   :0.600  
# 
# $`versicolor.(6,8]`
#   Sepal.Length   Sepal.Width    Petal.Length    Petal.Width  
#  Min.   :6.10   Min.   :2.20   Min.   :4.000   Min.   :1.20  
#  1st Qu.:6.20   1st Qu.:2.80   1st Qu.:4.400   1st Qu.:1.30  
#  Median :6.40   Median :2.90   Median :4.600   Median :1.40  
#  Mean   :6.45   Mean   :2.89   Mean   :4.585   Mean   :1.42  
#  3rd Qu.:6.70   3rd Qu.:3.10   3rd Qu.:4.700   3rd Qu.:1.50  
#  Max.   :7.00   Max.   :3.30   Max.   :5.000   Max.   :1.70  
# 
# [...results truncated...]

To include zero-count groups, set drop=FALSE in split.