data.table Tutorial => Equi-join

Example

# example data
a = data.table(id = c(1L, 1L, 2L, 3L, NA_integer_), x = 11:15)
#    id  x
# 1:  1 11
# 2:  1 12
# 3:  2 13
# 4:  3 14
# 5: NA 15

b = data.table(id = 1:2, y = -(1:2))
#    id  y
# 1:  1 -1
# 2:  2 -2

Intuition

Think of x[i] as selecting a subset of x for each row of i. This syntax mirrors matrix subsetting in base R and is consistent with the first argument meaning "where", in DT[where, select|update|do, by].

One might wonder why this new syntax is worth learning, since merge(x,i) still works with data.tables. The short answer is that it we usually wants to merge and then do something further. The x[i] syntax concisely captures this pattern of use and also allows for more efficient computation. For a more detailed explanation, read FAQs 1.12 and 2.14.

Handling multiply-matched rows

By default, every row of a matching each row of b is returned:

a[b, on="id"]
#    id  x  y
# 1:  1 11 -1
# 2:  1 12 -1
# 3:  2 13 -2

This can be tweaked with mult:

a[b, on="id", mult="first"]
#    id  x  y
# 1:  1 11 -1
# 2:  2 13 -2

Handling unmatched rows

By default, unmatched rows of a still show up in the result:

b[a, on="id"]
#    id  y  x
# 1:  1 -1 11
# 2:  1 -1 12
# 3:  2 -2 13
# 4:  3 NA 14
# 5: NA NA 15

To hide these, use nomatch:

b[a, on="id", nomatch=0]
#    id  y  x
# 1:  1 -1 11
# 2:  1 -1 12
# 3:  2 -2 13

Note that x[i] will attempt to match NAs in i.

Counting matches returned

To count the number of matches for each row of i, use .N and by=.EACHI.

b[a, on="id", .N, by=.EACHI]
#    id N
# 1:  1 1
# 2:  1 1
# 3:  2 1
# 4:  3 0
# 5: NA 0

PDF - Download data.table for free

Previous Next

data.table

Fastest Entity Framework Extensions

Example

Intuition

Handling multiply-matched rows

Handling unmatched rows

Counting matches returned

Got any data.table Question?

data.table

data.table Joins and merges Equi-join

Fastest Entity Framework Extensions

Example

Intuition

Handling multiply-matched rows

Handling unmatched rows

Counting matches returned

Got any data.table Question?