我试图弄清楚如何使用两个分组列来聚合一列,该列是具有二元结果(成功/失败)的因素。一项挑战是尝试为二元因子的一个结果具有零值的组组合生成汇总行。例如,假设这是以下数据框:
dat <- data.frame(
Group1 = c("A", "A", "A", "B", "B", "C", "C", "C"),
Group2 = c("D", "D", "D", "E", "E", "F", "F", "F"),
Result = c("Success", "Success", "Fail", "Success", "Success", "Success", "Fail", "Fail")
)
理想情况下,我想要一个如下所示的摘要:
# Group1 Group2 Success Fail
# A D 2 1
# B E 2 0
# C F 1 2
但是我正在努力生成拆分列结果。
我尝试将aggregate()与以下内容一起使用:
aggregate(Result ~ Group1 + Group2, data = dat, FUN=length)
但似乎无法拆分二元因子变量。
您可以使用
dplyr::summarize()
并对值求和:
dat %>%
summarise(Success = sum(Result == "Success"),
Fail = sum(Result == "Fail"),
.by = c(Group1, Group2))
输出:
# Group1 Group2 Success Fail
#1 A D 2 1
#2 B E 2 0
#3 C F 1 2
或者使用基数R和
aggregate
,虽然我确信必须有一种更优雅的方法来做到这一点,你可以单独计算计数,merge
,然后使用setNames
重命名:
setNames(
merge(
aggregate(Result ~ Group1 + Group2, data = dat, FUN = \(x) sum(x == "Success")),
aggregate(Result ~ Group1 + Group2, data = dat, FUN = \(x) sum(x == "Fail")),
by = c("Group1", "Group2")),
c("Group1", "Group2", "Success", "Fail")
)
# Group1 Group2 Success Fail
#1 A D 2 1
#2 B E 2 0
#3 C F 1 2
这里有类似的方法,使用
pivot_wider()
:
dplyr::pivot_wider(dat, names_from = Result, values_from = Result, values_fn = length, values_fill = 0)
备注:
values_fn = length
查找数据集中 Group1、Group2 和 Result 的每个组合的长度values_fill