我有一个看起来像这样的数据集。
Day|Population|Red|Yellow|Orange|Green
1 30 15 3 4 8
2 50 10 30 5 5
3 10 3 6 1 0
4 25 2 10 10 3
我想创造像这样的东西
Day|Color |Population
1 Green 8
1 Red,Orange,Yellow 22
2 Green 5
2 Red,Orange,Yellow 45
3 Green 0
3 Red,Orange,Yellow 10
4 Green 3
4 Red,Orange,Yellow 22
我有一些看起来像这样的东西,但它不起作用
df<- rbind(
summarise(df,Day,Population=df$Green,Color="Green"),
summarise(df,Day,Population=sum(df$Red,df$Yellow,df$Orange),
Color="Red,Orange,Yellow"))
这是一种使用
data.table
的方法
library(data.table)
colors = names(df)[3:length(names(df))]
target = "Green"
non_targets = setdiff(colors, target)
setDT(df)
rbindlist(list(
df[, .(Day, Color = target, Population = get(target))],
df[, .(Day, Color = paste0(non_targets, collapse="|"), Population=Population-get(target))]
))[order(Day)]
输出:
Day Color Population
1: 1 Green 8
2: 1 Red|Yellow|Orange 22
3: 2 Green 5
4: 2 Red|Yellow|Orange 45
5: 3 Green 0
6: 3 Red|Yellow|Orange 10
7: 4 Green 3
8: 4 Red|Yellow|Orange 22
另一种方法是旋转更长的时间并从那里进行操作。该方法使用
dplyr
和 tidyr
进行说明
colors = names(df)[3:length(names(df))]
target = "Green"
non_targets = setdiff(colors, target)
df_long = pivot_longer(df, -c(Day:Population), names_to = "Color")
bind_rows(
df_long %>%
filter(Color==target) %>%
select(Day,
Color,
Population=value
),
df_long %>%
group_by(Day) %>%
summarize(Population = sum(value) - value[Color==target]) %>%
mutate(Color = paste0(non_targets,
collapse="|"
)
)
) %>%
arrange(Day)
输出:
# A tibble: 8 × 3
Day Color Population
<int> <chr> <int>
1 1 Green 8
2 1 Red|Yellow|Orange 22
3 2 Green 5
4 2 Red|Yellow|Orange 45
5 3 Green 0
6 3 Red|Yellow|Orange 10
7 4 Green 3
8 4 Red|Yellow|Orange 22