这个问题在这里已有答案:
如何在data.table中为每个组选择x最高值?
例如,我想为每个组(日期)取两个最高值(Val)。所以对于这个数据集:
Date Name Val
01/01/2010 A 3
01/01/2010 B 2
01/01/2010 C 1
02/01/2010 A 4
02/01/2010 B 2
02/01/2010 C 3
02/01/2010 D 1
代码应该返回:
Date Name Val
01/01/2010 A 3
01/01/2010 B 2
02/01/2010 A 4
02/01/2010 C 3
df <- read.table(text = "Date Name Val
01/01/2010 A 3
01/01/2010 B 2
01/01/2010 C 1
02/01/2010 A 4
02/01/2010 B 2
02/01/2010 C 3
02/01/2010 D 1",
header = TRUE, stringsAsFactors = FALSE)
setDT(df)
df[, max_val := max(Val), by = Date]
df[, max_sec := order(Val, decreasing = T)[2], by = Date]
df <- df[Val == max_val | Val == max_sec, ]
df[, c("max_val", "max_sec") := NULL]
Date Name Val
1: 01/01/2010 A 3
2: 01/01/2010 B 2
3: 02/01/2010 A 4
4: 02/01/2010 C 3