Introduction
Data.table is a package that extends the functionality of data frames from base R, particularly improving on their performance and syntax. See the package's Docs area at Getting started with data.table for details.
Syntax
DT[i, j, by]
# DT[where, select|update|do, by]
DT[...][...]
# chaining
################# Shortcuts, special functions and special symbols inside DT[...]
- .()
# in several arguments, replaces list()
- J()
# in i, replaces list()
- :=
# in j, a function used to add or modify columns
- .N
# in i, the total number of rows
# in j, the number of rows in a group
- .I
# in j, the vector of row numbers in the table (filtered by i)
- .SD
# in j, the current subset of the data
# selected by the .SDcols argument
- .GRP
# in j, the current index of the subset of the data
- .BY
# in j, the list of by values for the current subset of data
- V1, V2, ...
# default names for unnamed columns created in j
################# Joins inside DT[...]
- DT1[DT2, on, j]
# join two tables
- i.*
# special prefix on DT2's columns after the join
- by=.EACHI
# special option available only with a join
- DT1[!DT2, on, j]
# anti-join two tables
- DT1[DT2, on, roll, j]
# join two tables, rolling on the last column in on=
################# Reshaping, stacking and splitting
- melt(DT, id.vars, measure.vars)
# transform to long format
# for multiple columns, use measure.vars = patterns(...)
- dcast(DT, formula)
# transform to wide format
- rbind(DT1, DT2, ...)
# stack enumerated data.tables
- rbindlist(DT_list, idcol)
# stack a list of data.tables
- split(DT, by)
# split a data.table into a list
################# Some other functions specialized for data.tables
- foverlaps
# overlap joins
- merge
# another way of joining two tables
- set
# another way of adding or modifying columns
- fintersect, fsetdiff, funion, fsetequal, unique, duplicated, anyDuplicated
# set-theory operations with rows as elements
- uniqueN
# the number of distinct rows
- rowidv(DT, cols)
# row ID (1 to .N) within each group determined by cols
- rleidv(DT, cols)
# group ID (1 to .GRP) within each group determined by runs of cols
- shift(DT, n, type=c("lag", "lead"))
# apply a shift operator to every column
- setorder, setcolorder, setnames, setkey, setindex, setattr
# modify attributes and order by reference
Installation and support
To install the data.table package:
# install from CRAN
install.packages("data.table")
# or install development version
install.packages("data.table", type = "source", repos = "http://Rdatatable.github.io/data.table")
# and to revert from devel to CRAN, the current version must first be removed
remove.packages("data.table")
install.packages("data.table")
The package's official site has wiki pages providing help getting started, and lists of presentations and articles from around the web. Before asking a question -- here on StackOverflow or anywhere else -- please read the support page.
Loading the package
Many of the functions in the examples above exist in the data.table namespace. To use them, you will need to add a line like library(data.table)
first or to use their full path, like data.table::fread
instead of simply fread
. For help on individual functions, the syntax is help("fread")
or ?fread
. Again, if the package is not loaded, use the full name like ?data.table::fread
.