拥有这样的数据结构:
dtest <- data.frame(label=c("yahoo","google","yahoo","yahoo","google","google","yahoo","yahoo"), year=c(2000,2001,2000,2001,2003,2003,2003,2003))
如何提取像这样的新数据帧:
doutput <- data.frame(label=c("yahoo","yahoo","yahoo","yahoo","google","google","google","google"), year=c(2000,2001,2002,2003,2000,2001,2002,2003), volume=c(2,1,0,3,0,1,0,2))
> doutput label year volume 1 yahoo 2000 2 2 yahoo 2001 1 3 yahoo 2002 0 4 yahoo 2003 3 5 google 2000 0 6 google 2001 1 7 google 2002 0 8 google 2003 2
一种方法是使用dplyr
:
library(dplyr)
dtest %>%
group_by(label, year) %>%
tally(name = "volume")
# A tibble: 5 x 3
# Groups: label [2]
label year volume
<fct> <dbl> <int>
1 google 2001 1
2 google 2003 2
3 yahoo 2000 2
4 yahoo 2001 1
5 yahoo 2003 2
这是一个基础R的解决方案:
as.data.frame(table(transform(dtest,
year = factor(year, levels = seq(min(year), max(year))))))
结果:
label year Freq
1 google 2000 0
2 yahoo 2000 2
3 google 2001 1
4 yahoo 2001 1
5 google 2002 0
6 yahoo 2002 0
7 google 2003 2
8 yahoo 2003 2