从给定的数据框中,我想按
date
列聚合数据。
date <- c("2020-01-10", "2020-01-10", "2020-01-10", "2020-01-10",
"2020-01-10", "2020-01-11","2020-01-11", "2020-01-11", "2020-01-11","2020-01-11",
"2020-01-12", "2020-01-12", "2020-01-12", "2020-01-12", "2020-01-12",
"2020-01-13","2020-01-13", "2020-01-13", "2020-01-13","2020-01-13")
ID <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
assets <- c(1, 2, 3, 10, 9, 21, 23,1, 1, 3, 11,11, 6,12,13,5,2 ,15,12,12)
category <- c("new", "new", "new","new","new",
"new", "new", "old", "old", "old", "old", "old", "old", "expired","expired","expired","expired"
,"expired","expired","expired")
df1 <- data.frame(ID, date, assets, category)
此代码转换数据框,还添加新列
total
,它是 new + old + expired
的总和
df_transformed <- df1 %>%
#group_by(date)>
pivot_wider(names_from = category, values_from = assets)%>%
mutate(total = c(new + old + expired))
print(df_transformed)
ID date new old expired total
<dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 2020-01-10 1 NA NA NA
2 2 2020-01-10 2 NA NA NA
3 3 2020-01-10 3 NA NA NA
4 4 2020-01-10 10 NA NA NA
5 5 2020-01-10 9 NA NA NA
6 6 2020-01-11 21 NA NA NA
7 7 2020-01-11 23 NA NA NA
8 8 2020-01-11 NA 1 NA NA
9 9 2020-01-11 NA 1 NA NA
10 10 2020-01-11 NA 3 NA NA
11 11 2020-01-12 NA 11 NA NA
12 12 2020-01-12 NA 11 NA NA
13 13 2020-01-12 NA 6 NA NA
14 14 2020-01-12 NA NA 12 NA
15 15 2020-01-12 NA NA 13 NA
16 16 2020-01-13 NA NA 5 NA
17 17 2020-01-13 NA NA 2 NA
18 18 2020-01-13 NA NA 15 NA
19 19 2020-01-13 NA NA 12 NA
20 20 2020-01-13 NA NA 12 NA
我希望实现以下成果。原始数据帧有很多 NA,所以我希望这些日期的总和为零。
ID date new old expired total
1 1 2020-01-10 sum of (01-10) sum of (01-10) sum of (01-10) new + old + expired
2 2 2020-01-11 sum of (01-11) sum of (01-11) sum of (01-11) new + old + expired
3 3 2020-01-12 sum of (01-12) sum of (01-12) sum of (01-12) new + old + expired
4 4 2020-01-13 sum of (01-13) sum of (01-13) sum of (01-13) new + old + expired
您可能需要
summarise
跨越所有内容来处理 NAs
函数中的 sum()
df_transformed |>
group_by(date) |>
summarise(across(everything(), .f = sum, na.rm = TRUE)) |>
mutate(total = new + old + expired)
输出:
A tibble: 4 x 6
date ID new old expired total
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
2020-01-10 15 25 0 0 25
2020-01-11 40 44 5 0 49
2020-01-12 65 0 28 25 53
2020-01-13 90 0 0 46 46