### Stats

1104 Wednesday, July 12, 2017
Not affiliated with Stack Overflow
Rip Tutorial: riptutorial@gmail.com

# data.table

## Introduction

Data.table is a package that extends the functionality of data frames from base R, particularly improving on their performance and syntax. See the package's Docs area at Getting started with data.table for details.

## Syntax

• `DT[i, j, by]`
# DT[where, select|update|do, by]
• `DT[...][...]`
# chaining
• `################# Shortcuts, special functions and special symbols inside DT[...]`
• .()
# in several arguments, replaces list()
• J()
# in i, replaces list()
• :=
# in j, a function used to add or modify columns
• .N
# in i, the total number of rows
# in j, the number of rows in a group
• .I
# in j, the vector of row numbers in the table (filtered by i)
• .SD
# in j, the current subset of the data
# selected by the .SDcols argument
• .GRP
# in j, the current index of the subset of the data
• .BY
# in j, the list of by values for the current subset of data
• V1, V2, ...
# default names for unnamed columns created in j
• `################# Joins inside DT[...]`
• DT1[DT2, on, j]
# join two tables
• i.*
# special prefix on DT2's columns after the join
• by=.EACHI
# special option available only with a join
• DT1[!DT2, on, j]
# anti-join two tables
• DT1[DT2, on, roll, j]
# join two tables, rolling on the last column in on=
• `################# Reshaping, stacking and splitting`
• melt(DT, id.vars, measure.vars)
# transform to long format
# for multiple columns, use measure.vars = patterns(...)
• dcast(DT, formula)
# transform to wide format
• rbind(DT1, DT2, ...)
# stack enumerated data.tables
• rbindlist(DT_list, idcol)
# stack a list of data.tables
• split(DT, by)
# split a data.table into a list
• `################# Some other functions specialized for data.tables`
• foverlaps
# overlap joins
• merge
# another way of joining two tables
• set
# another way of adding or modifying columns
• fintersect, fsetdiff, funion, fsetequal, unique, duplicated, anyDuplicated
# set-theory operations with rows as elements
• uniqueN
# the number of distinct rows
• rowidv(DT, cols)
# row ID (1 to .N) within each group determined by cols
• rleidv(DT, cols)
# group ID (1 to .GRP) within each group determined by runs of cols
# apply a shift operator to every column
• setorder, setcolorder, setnames, setkey, setindex, setattr
# modify attributes and order by reference

# Installation and support

To install the data.table package:

``````# install from CRAN
install.packages("data.table")

# or install development version
install.packages("data.table", type = "source", repos = "http://Rdatatable.github.io/data.table")

# and to revert from devel to CRAN, the current version must first be removed
remove.packages("data.table")
install.packages("data.table")
``````

The package's official site has wiki pages providing help getting started, and lists of presentations and articles from around the web. Before asking a question -- here on StackOverflow or anywhere else -- please read the support page.

Many of the functions in the examples above exist in the data.table namespace. To use them, you will need to add a line like `library(data.table)` first or to use their full path, like `data.table::fread` instead of simply `fread`. For help on individual functions, the syntax is `help("fread")` or `?fread`. Again, if the package is not loaded, use the full name like `?data.table::fread`.