data.table Tutorial => Getting started with data.table

Remarks

Data.table is a package for the R statistical computing environment. It extends the functionality of data frames from base R, particularly improving on their performance and syntax. A number of related tasks, including rolling and non-equi joins, are handled in a consistent concise syntax like DT[where, select|update|do, by].

A number of complementary functions are also included in the package:

I/O: fread/fwrite
Reshaping: melt/dcast/rbindlist/split
Runs of values: rleid

Versions

Version	Notes	Release Date on CRAN
1.9.4		2014-10-02
1.9.6		2015-09-19
1.9.8		2016-11-24
1.10.0	"With hindsight, the last release v1.9.8 should have been named v1.10.0"	2016-12-03
1.10.1	In development	2016-12-03

Getting started and finding help

The package's official wiki has some essential materials:

As a new user, you will want to check out the vignettes, FAQ and cheat sheet.
Before asking a question -- here on StackOverflow or anywhere else -- please read the support page.

For help on individual functions, the syntax is help("fread") or ?fread . If the package has not been loaded, use the full name like ?data.table::fread .

Installation and setup

Install the stable release from CRAN:

install.packages("data.table")

Or the development version from github:

install.packages("data.table", type = "source", 
  repos = "http://Rdatatable.github.io/data.table")

To revert from devel to CRAN, the current version must first be removed:

remove.packages("data.table")
install.packages("data.table")

Visit the website for full installation instructions and the latest version numbers.

Using the package

Usually you will want to load the package and all of its functions with a line like

library(data.table)

If you only need one or two functions, you can refer to them like data.table::fread instead.

Syntax and features

Basic syntax

DT[where, select|update|do, by] syntax is used to work with columns of a data.table.

The "where" part is the i argument
The "select|update|do" part is the j argument

These two arguments are usually passed by position instead of by name.

A sequence of steps can be chained like DT[...][...] .

Shortcuts, special functions and special symbols inside `DT[...]`

Function or symbol	Notes
`.()`	in several arguments, replaces `list()`
`J()`	in `i` , replaces `list()`
`:=`	in `j` , a function used to add or modify columns
`.N`	in `i` , the total number of rows in `j` , the number of rows in a group
`.I`	in `j` , the vector of row numbers in the table (filtered by `i` )
`.SD`	in `j` , the current subset of the data selected by the `.SDcols` argument
`.GRP`	in `j` , the current index of the subset of the data
`.BY`	in `j` , the list of by values for the current subset of data
`V1, V2, ...`	default names for unnamed columns created in `j`

Joins inside `DT[...]`

Notation	Notes
`DT1[DT2, on, j]`	join two tables
`i.*`	special prefix on DT2's columns after the join
`by=.EACHI`	special option available only with a join
`DT1[!DT2, on, j]`	anti-join two tables
`DT1[DT2, on, roll, j]`	join two tables, rolling on the last column in `on=`

Reshaping, stacking and splitting

Notation	Notes
`melt(DT, id.vars, measure.vars)`	transform to long format for multiple columns, use `measure.vars = patterns(...)`
`dcast(DT, formula)`	transform to wide format
`rbind(DT1, DT2, ...)`	stack enumerated data.tables
`rbindlist(DT_list, idcol)`	stack a list of data.tables
`split(DT, by)`	split a data.table into a list

Some other functions specialized for data.tables

Function(s)	Notes
`foverlaps`	overlap joins
`merge`	another way of joining two tables
`set`	another way of adding or modifying columns
`fintersect` , `fsetdiff` , `funion` , `fsetequal` , `unique` , `duplicated` , `anyDuplicated`	set-theory operations with rows as elements
`CJ`	the Cartesian product of vectors
`uniqueN`	the number of distinct rows
`rowidv(DT, cols)`	row ID (1 to .N) within each group determined by cols
`rleidv(DT, cols)`	group ID (1 to .GRP) within each group determined by runs of cols
`shift(DT, n)`	apply a shift operator to every column
`setorder` , `setcolorder` , `setnames` , `setkey` , `setindex` , `setattr`	modify attributes and order by reference

Other features of the package

Features	Notes
`IDate` and `ITime`	integer dates and times

PDF - Download data.table for free

Previous Next

data.table

Fastest Entity Framework Extensions

Remarks

Versions

Getting started and finding help

Installation and setup

Using the package

Syntax and features

Basic syntax

Shortcuts, special functions and special symbols inside `DT[...]`

Joins inside `DT[...]`

Reshaping, stacking and splitting

Some other functions specialized for data.tables

Other features of the package

Got any data.table Question?

data.table

data.table Getting started with data.table

Fastest Entity Framework Extensions

Remarks

Versions

Getting started and finding help

Installation and setup

Using the package

Syntax and features

Basic syntax

Shortcuts, special functions and special symbols inside DT[...]

Joins inside DT[...]

Reshaping, stacking and splitting

Some other functions specialized for data.tables

Other features of the package

Got any data.table Question?

Shortcuts, special functions and special symbols inside `DT[...]`

Joins inside `DT[...]`