折叠/将多行聚合为单行,并基于相同的 ID 共享百分比

问题描述 投票:0回答:1

我需要一些关于如何聚合行共享 ID 以获取百分比数据的建议/提示。

我的数据框格式为:

id  county_1   country_2     country_3  .... country_x  sum_by_id
1    10              0           0               0       100
1    0               20          0               0       100
1    0               0          70               0       100
2    10              0           0              20        80
2    0               20          0               0        80       
2    0               10         20               0        80      
3   ...
3   ...

# noted sum_by_id is the sum from country_1 to country_x/ based on shared IDs.

我希望得到的是:

id  county_1   country_2     country_3   ... country_x
1    0.1          0.2           0.7            0
2    0.125        0.375         0.25           0.25         
3    ...
4    ...

# FYI. For project ID"2", 0.375 = (20+10)/80

我一直在寻找这种聚合的例子。如果有人能指出我正确的位置来查看或就这个问题提供一些建议,我将不胜感激。

谢谢!

要重现数据,

# Required packages
library(dplyr)
library(tidyr)

# Data
df1 <- data.frame(id = c(1, 1, 1, 2, 2, 2),
                  country_1 = c(10,0, 0, 10,0,0), 
                  country_2 = c(0,20, 0, 0, 20, 10),
                  country_3 = c(0,0, 70, 0, 0, 20),
                  country_x = c(0,0,0,20,0,0))

df1 <- df1 %>% mutate(sum_by_id = across(starts_with("country_"))%>%rowSums )
dplyr aggregate data-cleaning
1个回答
0
投票

我不知道这是否是最优雅的解决方案,但它确实有效。

# Required packages
library(dplyr)
library(tidyr)

# Data
df1 <- data.frame(id = c(1, 1, 1, 2, 2, 2),
                  country_1 = c(10,0, 0, 10,0,0), 
                  country_2 = c(0,20, 0, 0, 20, 10),
                  country_3 = c(0,0, 70, 0, 0, 20),
                  country_x = c(0,0,0,20,0,0))

注意:创建我自己的

sum_by_id

df1 %>% 
        pivot_longer(-id, names_to = "country", values_to = "value") %>%
        summarise(value = sum(value), .by = c(id, country)) %>%
        mutate(
                sum_by_id = sum(value),
                value = value / sum_by_id,
                .by = id
        ) %>%
        pivot_wider(names_from = country, values_from = value)

输出:

# A tibble: 2 × 6
     id sum_by_id country_1 country_2 country_3 country_x
  <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
1     1       100     0.1       0.2        0.7       0   
2     2        80     0.125     0.375      0.25      0.25
© www.soinside.com 2019 - 2024. All rights reserved.