我想使用Beer
和Wine
和Spirits
的组合以及行总数的SUM
,例如:
df <- data.frame(
Beer = c(1L, 1L, 1L, 3L, 4L, 0L, 0L, 3L),
Wine = c(1L, 1L, 0L, 0L, 0L, 1L, 1L, 2L),
Spirits = c(0L, 1L, 0L, 0L, 0L, 1L, 2L, 1L),
SUM = c(2L, 3L, 1L, 3L, 4L, 2L, 3L, 6L)
)
如何创建一个新的列COMBINE
,它将SUM
和列名称粘贴在一起,其值大于0.这样的事情,但任何大于5的SUM
都被认为是5+。
Beer Wine Spirits SUM COMBINE
1 1 0 2 2Beer&Wine
1 1 1 3 3Beer&Wine&Spirits
1 0 0 1 1Beer
1 0 0 1 1Beer
3 0 0 3 3Beer
4 0 0 4 4Beer
0 1 1 2 2Wine&Spirits
0 1 1 2 2Wine&Spirits
3 2 1 6 5+Beer&Wine&Spirits
对于一些附加的上下文,所有这一切的最终结果是我想要计算COMBINE
中的因子,尽管这不是我正在努力的部分。
COMBINE Count
1Beer 2
2Beer 0
3Beer 1
4Beer 1
5+Beer 0
1Wine 0
2Wine 0
.
.
.
2Wine&Spirits 2
.....
使用ifelse
的直接解决方案:
d$COMBINE <- with(d, gsub("&$", "",
paste0(ifelse(SUM > 5, "5+", SUM),
ifelse(Beer > 0, "Beer&", ""),
ifelse(Wine > 0, "Wine&", ""),
ifelse(Spirits > 0, "Spirits", ""))))
Beer Wine Spirits SUM COMBINE
1 1 1 0 2 2Beer&Wine
2 1 1 1 3 3Beer&Wine&Spirits
3 1 0 0 1 1Beer
4 3 0 0 3 3Beer
5 4 0 0 4 4Beer
6 0 1 1 2 2Wine&Spirits
7 0 1 2 3 3Wine&Spirits
8 3 2 1 6 5+Beer&Wine&Spirits
要计算您可以使用的因素:table(d$COMBINE)
您也可以使用聚合:
d$sum5 = pmin(5, d$Beer + d$Wine + d$Spirits)
d$count = 1
aggregate(count ~ (Beer>0) + (Wine>0) + (Spirits>0) + sum5, data=d, FUN=sum)
Beer > 0 Wine > 0 Spirits > 0 sum5 count
1 TRUE FALSE FALSE 1 1
2 TRUE TRUE FALSE 2 1
3 FALSE TRUE TRUE 2 1
4 TRUE FALSE FALSE 3 1
5 FALSE TRUE TRUE 3 1
6 TRUE TRUE TRUE 3 1
7 TRUE FALSE FALSE 4 1
8 TRUE TRUE TRUE 5 1
忽略总和(给出一个例子,其中一切都不是1):
aggregate(count ~ (Beer>0) + (Wine>0) + (Spirits>0), data=d, FUN=sum)
Beer > 0 Wine > 0 Spirits > 0 count
1 TRUE FALSE FALSE 3
2 TRUE TRUE FALSE 1
3 FALSE TRUE TRUE 2
4 TRUE TRUE TRUE 2