我需要一些关于如何聚合行共享 ID 以获取百分比数据的建议/提示。
我的数据框格式为:
id county_1 country_2 country_3 .... country_x sum_by_id
1 10 0 0 0 100
1 0 20 0 0 100
1 0 0 70 0 100
2 10 0 0 20 80
2 0 20 0 0 80
2 0 10 20 0 80
3 ...
3 ...
# noted sum_by_id is the sum from country_1 to country_x/ based on shared IDs.
我希望得到的是:
id county_1 country_2 country_3 ... country_x
1 0.1 0.2 0.7 0
2 0.125 0.375 0.25 0.25
3 ...
4 ...
# FYI. For project ID"2", 0.375 = (20+10)/80
我一直在寻找这种聚合的例子。如果有人能指出我正确的位置来查看或就这个问题提供一些建议,我将不胜感激。
谢谢!
要重现数据,
# Required packages
library(dplyr)
library(tidyr)
# Data
df1 <- data.frame(id = c(1, 1, 1, 2, 2, 2),
country_1 = c(10,0, 0, 10,0,0),
country_2 = c(0,20, 0, 0, 20, 10),
country_3 = c(0,0, 70, 0, 0, 20),
country_x = c(0,0,0,20,0,0))
df1 <- df1 %>% mutate(sum_by_id = across(starts_with("country_"))%>%rowSums )
我不知道这是否是最优雅的解决方案,但它确实有效。
# Required packages
library(dplyr)
library(tidyr)
# Data
df1 <- data.frame(id = c(1, 1, 1, 2, 2, 2),
country_1 = c(10,0, 0, 10,0,0),
country_2 = c(0,20, 0, 0, 20, 10),
country_3 = c(0,0, 70, 0, 0, 20),
country_x = c(0,0,0,20,0,0))
注意:创建我自己的
sum_by_id
。
df1 %>%
pivot_longer(-id, names_to = "country", values_to = "value") %>%
summarise(value = sum(value), .by = c(id, country)) %>%
mutate(
sum_by_id = sum(value),
value = value / sum_by_id,
.by = id
) %>%
pivot_wider(names_from = country, values_from = value)
输出:
# A tibble: 2 × 6
id sum_by_id country_1 country_2 country_3 country_x
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 100 0.1 0.2 0.7 0
2 2 80 0.125 0.375 0.25 0.25