我有一个问题,我正在尝试将数据框转换为汇总数据框,其中包含基于多列的总和和加权平均值...
下面是我正在做的一个例子......
df <- data.frame('title' = c("X", "Y", "Z", "X", "Y", "Z"),'date' = c("2020-01-01", "2020-01-01", "2020-01-01", "2020-01-02", "2020-01-03", "2020-01-02"),'weight1' = c(84024, 54241, 106601, 65382, 337007, 687682),'weight2' = c(30, 30, 30, 15, 15, 15),'metric1' = c(3.08, 0.964, 0.839, 1.60, 0.839, 0.648),'metric2' = c(588.03, 298.26, 13.95, 104.29, 10.51, 72.53))
agg = df %>%
group_by(df$date, df$title) %>%
summarise(total_weight1 = sum(weight1, na.rm=TRUE), # total sum
total_weight1 = sum(weight2, na.rm=TRUE), # total sum
mean_metric1 = weighted.mean(x = metric1, w = c(weight1, weight2), na.rm=T), # weighted avg
mean_metric2 = weighted.mean(x = metric2, w = c(weight1, weight2), na.rm=T)) # weighted avg
但是,当然,我收到一条错误消息:
总结()Error in
:
mean_metric1 = 加权.平均值(x = metric1, w = c(权重1, 权重2), na.rm = T)Problem while computing
. The error occurred in group 1: df$date = "2020-01-01", df$title = "X".
weighted.mean.default()Caused by error in
:
! 'x' and 'w' must have the same length
是否可以计算基于两列的值的加权平均值?我尝试寻找其他来源来手动验证此计算,但失败了......
我也不确定在同一个数据框中计算这一切是否是正确的方法。任何指导将不胜感激。
我尝试过其他来源和函数,包括 rowwise() 和 mutate() 但我不断遇到同样的问题,我的 'x' 和 'w' 需要具有相同的长度......
也许将权重向量相乘会做你除了做的事情
agg = df %>%
group_by(df$date, df$title) %>%
summarise(
total_weight1 = sum(weight1, na.rm=TRUE), # total sum
total_weight1 = sum(weight2, na.rm=TRUE), # total sum
mean_metric1 = weighted.mean(x = metric1, w = weight1 * weight2, na.rm=T), # weighted avg
mean_metric2 = weighted.mean(x = metric2, w = weight1 * weight2, na.rm=T)) # weighted avg
agg
# A tibble: 6 × 5
# Groups: df$date [3]
# `df$date` `df$title` total_weight1 mean_metric1 mean_metric2
# <chr> <chr> <dbl> <dbl> <dbl>
# 1 2020-01-01 X 30 3.08 588.
# 2 2020-01-01 Y 30 0.964 298.
# 3 2020-01-01 Z 30 0.839 14.0
# 4 2020-01-02 X 15 1.6 104.
# 5 2020-01-02 Z 15 0.648 72.5
# 6 2020-01-03 Y 15 0.839 10.5