data.table Tutorial => Syntax and features

Download data.table (PDF)

Fastest Entity Framework Extensions

Example

Basic syntax

DT[where, select|update|do, by] syntax is used to work with columns of a data.table.

The "where" part is the i argument
The "select|update|do" part is the j argument

These two arguments are usually passed by position instead of by name.

A sequence of steps can be chained like DT[...][...].

Shortcuts, special functions and special symbols inside `DT[...]`

Function or symbol	Notes
`.()`	in several arguments, replaces `list()`
`J()`	in `i`, replaces `list()`
`:=`	in `j`, a function used to add or modify columns
`.N`	in `i`, the total number of rows in `j`, the number of rows in a group
`.I`	in `j`, the vector of row numbers in the table (filtered by `i`)
`.SD`	in `j`, the current subset of the data selected by the `.SDcols` argument
`.GRP`	in `j`, the current index of the subset of the data
`.BY`	in `j`, the list of by values for the current subset of data
`V1, V2, ...`	default names for unnamed columns created in `j`

Joins inside `DT[...]`

Notation	Notes
`DT1[DT2, on, j]`	join two tables
`i.*`	special prefix on DT2's columns after the join
`by=.EACHI`	special option available only with a join
`DT1[!DT2, on, j]`	anti-join two tables
`DT1[DT2, on, roll, j]`	join two tables, rolling on the last column in `on=`

Reshaping, stacking and splitting

Notation	Notes
`melt(DT, id.vars, measure.vars)`	transform to long format for multiple columns, use `measure.vars = patterns(...)`
`dcast(DT, formula)`	transform to wide format
`rbind(DT1, DT2, ...)`	stack enumerated data.tables
`rbindlist(DT_list, idcol)`	stack a list of data.tables
`split(DT, by)`	split a data.table into a list

Some other functions specialized for data.tables

Function(s)	Notes
`foverlaps`	overlap joins
`merge`	another way of joining two tables
`set`	another way of adding or modifying columns
`fintersect`, `fsetdiff`, `funion`, `fsetequal`, `unique`, `duplicated`, `anyDuplicated`	set-theory operations with rows as elements
`CJ`	the Cartesian product of vectors
`uniqueN`	the number of distinct rows
`rowidv(DT, cols)`	row ID (1 to .N) within each group determined by cols
`rleidv(DT, cols)`	group ID (1 to .GRP) within each group determined by runs of cols
`shift(DT, n)`	apply a shift operator to every column
`setorder`, `setcolorder`, `setnames`, `setkey`, `setindex`, `setattr`	modify attributes and order by reference

Other features of the package

Features	Notes
`IDate` and `ITime`	integer dates and times

PDF - Download data.table for free

Previous Next

Get monthly updates about new articles, cheatsheets, and tricks.