data.table Tutorial => Selecting rows within each group

Example

# example data
DT <- data.table(Titanic)

Suppose that, for each sex, we want the rows with the highest survival numbers:

DT[Survived == "Yes", .SD[ N == max(N) ], by=Sex]

#    Class    Sex   Age Survived   N
# 1:  Crew   Male Adult      Yes 192
# 2:   1st Female Adult      Yes 140

.SD is the subset of data associated with each Sex; and we are subsetting this further, to the rows that meet our condition. If speed is important, instead use an approach suggested by eddi on SO:

DT[ DT[Survived == "Yes", .I[ N == max(N) ], by=Sex]$V1 ]

#    Class    Sex   Age Survived   N
# 1:  Crew   Male Adult      Yes 192
# 2:   1st Female Adult      Yes 140

Pitfalls

In the last line of code, .I refers to the row numbers of the full data.table. However, this is not true when there is no by:

DT[ Survived == "Yes", .I]

# 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16

DT[ Survived == "Yes", .I, by=Sex]$I

# 17 18 19 20 25 26 27 28 21 22 23 24 29 30 31 32

PDF - Download data.table for free

Previous Next

data.table

Fastest Entity Framework Extensions

Example

Pitfalls

Got any data.table Question?

data.table

data.table Subsetting rows by group Selecting rows within each group

Fastest Entity Framework Extensions

Example

Pitfalls

Got any data.table Question?