根据条件填写列与R中的另一列

Question

我有以下输入表：

  input <- structure(
    list(
      individual = c(1, 2, 3, 4),
      age = c(20, 34, 29, 30),
      earnings_2020 = c(0, 0, 1, 0),
      earnings_2021 = c(1, 0, 2, 0),
      earnings_2022 = c(2, 1, 3, 1),
      earnings1 = c(20000, 25000, 28000, 30000),
      earnings2 = c(30000, 36000, 39000, 40000),
      earnings3 = c(40000, 47000, 42000, 50000)
    ),
    class = "data.frame",
    row.names = c(NA, -4L)
  )

我想根据盈余_YEAR 列中的值分配盈余列（即盈余_YEAR 中的当前值表示盈余NUMBER 值应如何编制索引）。

因此，在此示例中，由于个人 1 的收入_2021 == 1，因此收入_2021 应设置为收入 1 (20,0000)。收益_2022将设置为收益2（30,000），依此类推。每个人的索引都不同。输出将如下所示：

个人	年龄	收益_2020	收益_2021	收益_2022	收益1	收益2	收益3
1	20	0	20000	30000	20000	30000	40000
2	34	0	0	25000	25000	36000	40000
3	29	28000	39000	42000	28000	39000	42000
4	30	0	0	30000	30000	40000	50000

请注意，我不需要保留收入 1、收入 2、收入 3 列，但出于说明目的，我已将它们保留在输出表中。

如何在 R 中轻松完成此操作？我想避免使用 for 循环，因为我正在处理大型数据集。

我尝试了以下代码，但出现错误：

earnings_columns <- c("earnings_2020", "earnings_2021", "earnings_2022")
earnings_input_columns <- c("earnings1", "earnings2", "earnings3")

 df <- df %>%
    mutate(
      across(
        .cols = all_of(earnings_columns), 
        .fns = ~ {
          case_when(
            . >= 1 & . <= 3 ~ {
              input_column <- earnings_input_columns[.]
              if (!is.null(input_column) && input_column %in% colnames(df)) {
                df[[input_column]]
              } else {
                .
              }
            },
            TRUE ~ . 
          )
        },
        .names = "{.col}"
      )
    )

它产生这个错误：

Error in `mutate()`:
! Problem while computing `..1 = across(...)`.
Caused by error in `across()`:
! Problem while computing column `earnings_2021`.
Caused by error in `!is.null(input_column) && input_column %in% colnames(df)`:
! 'length = 2' in coercion to 'logical(1)'
Run `rlang::last_trace()` to see where the error occurred.

Answer 1

将数据转换为长格式会比尝试交叉引用列容易得多。以下代码给出了所需的输出，但其部分复杂性是由于将所有内容转换回宽格式，这可能不是最好的工作格式

library(tidyverse)

input %>%
  pivot_longer(contains("earnings_"), names_to = "Year") %>%
  filter(value != 0) %>%
  mutate(Year = sub("^earnings_(.*)$", "\\1", Year)) %>%
  mutate(result = grep("earnings", names(.), value = TRUE)[value]) %>%
  pivot_longer(starts_with("earnings"), values_to = "earned") %>%
  filter(result == name) %>%
  select(-result, -name) %>%
  pivot_wider(id_cols = c("individual", "age"), names_from = Year,
              values_from = "earned", names_prefix = "earnings_",
              values_fill = 0, names_sort = TRUE) %>%
  bind_cols(select(input, matches("earnings\\d")))

#> # A tibble: 4 x 8
#>   individual   age earnings_2020 earnings_2021 earnings_2022 earnings1 earnings2 earnings3
#>        <dbl> <dbl>         <dbl>         <dbl>         <dbl>     <dbl>     <dbl>     <dbl>
#> 1          1    20             0         20000         30000     20000     30000     40000
#> 2          2    34             0             0         25000     25000     36000     40000
#> 3          3    29         28000         39000         42000     28000     39000     42000
#> 4          4    30             0             0         30000     30000     40000     50000

根据条件填写列与R中的另一列

问题描述投票：0回答：1

1个回答

最新问题

根据条件填写列与R中的另一列

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1