# example data
DT <- data.table(Titanic)
Suppose that, for each sex, we want the rows with the highest survival numbers:
DT[Survived == "Yes", .SD[ N == max(N) ], by=Sex]
# Class Sex Age Survived N
# 1: Crew Male Adult Yes 192
# 2: 1st Female Adult Yes 140
.SD
is the subset of data associated with each Sex
; and we are subsetting this further, to the rows that meet our condition. If speed is important, instead use an approach suggested by eddi on SO:
DT[ DT[Survived == "Yes", .I[ N == max(N) ], by=Sex]$V1 ]
# Class Sex Age Survived N
# 1: Crew Male Adult Yes 192
# 2: 1st Female Adult Yes 140
In the last line of code, .I
refers to the row numbers of the full data.table. However, this is not true when there is no by
:
DT[ Survived == "Yes", .I]
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
DT[ Survived == "Yes", .I, by=Sex]$I
# 17 18 19 20 25 26 27 28 21 22 23 24 29 30 31 32