# example data
a = data.table(id = c(1L, 1L, 2L, 3L, NA_integer_), x = 11:15)
# id x
# 1: 1 11
# 2: 1 12
# 3: 2 13
# 4: 3 14
# 5: NA 15
b = data.table(id = 1:2, y = -(1:2))
# id y
# 1: 1 -1
# 2: 2 -2
Think of x[i]
as selecting a subset of x
for each row of i
. This syntax mirrors matrix subsetting in base R and is consistent with the first argument meaning "where", in DT[where, select|update|do, by]
.
One might wonder why this new syntax is worth learning, since merge(x,i)
still works with data.tables. The short answer is that it we usually wants to merge and then do something further. The x[i]
syntax concisely captures this pattern of use and also allows for more efficient computation. For a more detailed explanation, read FAQs 1.12 and 2.14.
By default, every row of a
matching each row of b
is returned:
a[b, on="id"]
# id x y
# 1: 1 11 -1
# 2: 1 12 -1
# 3: 2 13 -2
This can be tweaked with mult
:
a[b, on="id", mult="first"]
# id x y
# 1: 1 11 -1
# 2: 2 13 -2
By default, unmatched rows of a
still show up in the result:
b[a, on="id"]
# id y x
# 1: 1 -1 11
# 2: 1 -1 12
# 3: 2 -2 13
# 4: 3 NA 14
# 5: NA NA 15
To hide these, use nomatch
:
b[a, on="id", nomatch=0]
# id y x
# 1: 1 -1 11
# 2: 1 -1 12
# 3: 2 -2 13
Note that x[i]
will attempt to match NAs in i
.
To count the number of matches for each row of i
, use .N
and by=.EACHI
.
b[a, on="id", .N, by=.EACHI]
# id N
# 1: 1 1
# 2: 1 1
# 3: 2 1
# 4: 3 0
# 5: NA 0