我有以下输入表:
input <- structure(
list(
individual = c(1, 2, 3, 4),
age = c(20, 34, 29, 30),
earnings_2020 = c(0, 0, 1, 0),
earnings_2021 = c(1, 0, 2, 0),
earnings_2022 = c(2, 1, 3, 1),
earnings1 = c(20000, 25000, 28000, 30000),
earnings2 = c(30000, 36000, 39000, 40000),
earnings3 = c(40000, 47000, 42000, 50000)
),
class = "data.frame",
row.names = c(NA, -4L)
)
我想根据盈余_YEAR 列中的值分配盈余列(即盈余_YEAR 中的当前值表示盈余NUMBER 值应如何编制索引)。
因此,在此示例中,由于个人 1 的收入_2021 == 1,因此收入_2021 应设置为收入 1 (20,0000)。收益_2022将设置为收益2(30,000),依此类推。每个人的索引都不同。输出将如下所示:
个人 | 年龄 | 收益_2020 | 收益_2021 | 收益_2022 | 收益1 | 收益2 | 收益3 |
---|---|---|---|---|---|---|---|
1 | 20 | 0 | 20000 | 30000 | 20000 | 30000 | 40000 |
2 | 34 | 0 | 0 | 25000 | 25000 | 36000 | 40000 |
3 | 29 | 28000 | 39000 | 42000 | 28000 | 39000 | 42000 |
4 | 30 | 0 | 0 | 30000 | 30000 | 40000 | 50000 |
请注意,我不需要保留收入 1、收入 2、收入 3 列,但出于说明目的,我已将它们保留在输出表中。
如何在 R 中轻松完成此操作?我想避免使用 for 循环,因为我正在处理大型数据集。
我尝试了以下代码,但出现错误:
earnings_columns <- c("earnings_2020", "earnings_2021", "earnings_2022")
earnings_input_columns <- c("earnings1", "earnings2", "earnings3")
df <- df %>%
mutate(
across(
.cols = all_of(earnings_columns),
.fns = ~ {
case_when(
. >= 1 & . <= 3 ~ {
input_column <- earnings_input_columns[.]
if (!is.null(input_column) && input_column %in% colnames(df)) {
df[[input_column]]
} else {
.
}
},
TRUE ~ .
)
},
.names = "{.col}"
)
)
它产生这个错误:
Error in `mutate()`:
! Problem while computing `..1 = across(...)`.
Caused by error in `across()`:
! Problem while computing column `earnings_2021`.
Caused by error in `!is.null(input_column) && input_column %in% colnames(df)`:
! 'length = 2' in coercion to 'logical(1)'
Run `rlang::last_trace()` to see where the error occurred.
将数据转换为长格式会比尝试交叉引用列容易得多。以下代码给出了所需的输出,但其部分复杂性是由于将所有内容转换回宽格式,这可能不是最好的工作格式
library(tidyverse)
input %>%
pivot_longer(contains("earnings_"), names_to = "Year") %>%
filter(value != 0) %>%
mutate(Year = sub("^earnings_(.*)$", "\\1", Year)) %>%
mutate(result = grep("earnings", names(.), value = TRUE)[value]) %>%
pivot_longer(starts_with("earnings"), values_to = "earned") %>%
filter(result == name) %>%
select(-result, -name) %>%
pivot_wider(id_cols = c("individual", "age"), names_from = Year,
values_from = "earned", names_prefix = "earnings_",
values_fill = 0, names_sort = TRUE) %>%
bind_cols(select(input, matches("earnings\\d")))
#> # A tibble: 4 x 8
#> individual age earnings_2020 earnings_2021 earnings_2022 earnings1 earnings2 earnings3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 20 0 20000 30000 20000 30000 40000
#> 2 2 34 0 0 25000 25000 36000 40000
#> 3 3 29 28000 39000 42000 28000 39000 42000
#> 4 4 30 0 0 30000 30000 40000 50000