data.table Getting started with data.table Syntax and features


Basic syntax

DT[where, select|update|do, by] syntax is used to work with columns of a data.table.

  • The "where" part is the i argument
  • The "select|update|do" part is the j argument

These two arguments are usually passed by position instead of by name.

A sequence of steps can be chained like DT[...][...].

Shortcuts, special functions and special symbols inside DT[...]

Function or symbolNotes
.()in several arguments, replaces list()
J()in i, replaces list()
:=in j, a function used to add or modify columns
.Nin i, the total number of rows
in j, the number of rows in a group
.Iin j, the vector of row numbers in the table (filtered by i)
.SDin j, the current subset of the data
selected by the .SDcols argument
.GRPin j, the current index of the subset of the data
.BYin j, the list of by values for the current subset of data
V1, V2, ...default names for unnamed columns created in j

Joins inside DT[...]

DT1[DT2, on, j]join two tables
i.*special prefix on DT2's columns after the join
by=.EACHIspecial option available only with a join
DT1[!DT2, on, j]anti-join two tables
DT1[DT2, on, roll, j]join two tables, rolling on the last column in on=

Reshaping, stacking and splitting

melt(DT, id.vars, measure.vars)transform to long format
for multiple columns, use measure.vars = patterns(...)
dcast(DT, formula)transform to wide format
rbind(DT1, DT2, ...)stack enumerated data.tables
rbindlist(DT_list, idcol)stack a list of data.tables
split(DT, by)split a data.table into a list

Some other functions specialized for data.tables

foverlapsoverlap joins
mergeanother way of joining two tables
setanother way of adding or modifying columns
fintersect, fsetdiff,
funion, fsetequal,
unique, duplicated, anyDuplicated
set-theory operations with rows as elements
CJthe Cartesian product of vectors
uniqueNthe number of distinct rows
rowidv(DT, cols)row ID (1 to .N) within each group determined by cols
rleidv(DT, cols)group ID (1 to .GRP) within each group determined by runs of cols
shift(DT, n)apply a shift operator to every column
setorder, setcolorder,
setnames, setkey, setindex,
modify attributes and order by reference

Other features of the package

IDate and ITimeinteger dates and times