根据条件填写列与R中的另一列

问题描述 投票:0回答:1

我有以下输入表:

  input <- structure(
    list(
      individual = c(1, 2, 3, 4),
      age = c(20, 34, 29, 30),
      earnings_2020 = c(0, 0, 1, 0),
      earnings_2021 = c(1, 0, 2, 0),
      earnings_2022 = c(2, 1, 3, 1),
      earnings1 = c(20000, 25000, 28000, 30000),
      earnings2 = c(30000, 36000, 39000, 40000),
      earnings3 = c(40000, 47000, 42000, 50000)
    ),
    class = "data.frame",
    row.names = c(NA, -4L)
  )

我想根据盈余_YEAR 列中的值分配盈余列(即盈余_YEAR 中的当前值表示盈余NUMBER 值应如何编制索引)。

因此,在此示例中,由于个人 1 的收入_2021 == 1,因此收入_2021 应设置为收入 1 (20,0000)。收益_2022将设置为收益2(30,000),依此类推。每个人的索引都不同。输出将如下所示:

个人 年龄 收益_2020 收益_2021 收益_2022 收益1 收益2 收益3
1 20 0 20000 30000 20000 30000 40000
2 34 0 0 25000 25000 36000 40000
3 29 28000 39000 42000 28000 39000 42000
4 30 0 0 30000 30000 40000 50000

请注意,我不需要保留收入 1、收入 2、收入 3 列,但出于说明目的,我已将它们保留在输出表中。

如何在 R 中轻松完成此操作?我想避免使用 for 循环,因为我正在处理大型数据集。

我尝试了以下代码,但出现错误:

earnings_columns <- c("earnings_2020", "earnings_2021", "earnings_2022")
earnings_input_columns <- c("earnings1", "earnings2", "earnings3")

 df <- df %>%
    mutate(
      across(
        .cols = all_of(earnings_columns), 
        .fns = ~ {
          case_when(
            . >= 1 & . <= 3 ~ {
              input_column <- earnings_input_columns[.]
              if (!is.null(input_column) && input_column %in% colnames(df)) {
                df[[input_column]]
              } else {
                .
              }
            },
            TRUE ~ . 
          )
        },
        .names = "{.col}"
      )
    )

它产生这个错误:

Error in `mutate()`:
! Problem while computing `..1 = across(...)`.
Caused by error in `across()`:
! Problem while computing column `earnings_2021`.
Caused by error in `!is.null(input_column) && input_column %in% colnames(df)`:
! 'length = 2' in coercion to 'logical(1)'
Run `rlang::last_trace()` to see where the error occurred.
r dplyr purrr transformation data-wrangling
1个回答
0
投票

将数据转换为长格式会比尝试交叉引用列容易得多。以下代码给出了所需的输出,但其部分复杂性是由于将所有内容转换回宽格式,这可能不是最好的工作格式

library(tidyverse)

input %>%
  pivot_longer(contains("earnings_"), names_to = "Year") %>%
  filter(value != 0) %>%
  mutate(Year = sub("^earnings_(.*)$", "\\1", Year)) %>%
  mutate(result = grep("earnings", names(.), value = TRUE)[value]) %>%
  pivot_longer(starts_with("earnings"), values_to = "earned") %>%
  filter(result == name) %>%
  select(-result, -name) %>%
  pivot_wider(id_cols = c("individual", "age"), names_from = Year,
              values_from = "earned", names_prefix = "earnings_",
              values_fill = 0, names_sort = TRUE) %>%
  bind_cols(select(input, matches("earnings\\d")))

#> # A tibble: 4 x 8
#>   individual   age earnings_2020 earnings_2021 earnings_2022 earnings1 earnings2 earnings3
#>        <dbl> <dbl>         <dbl>         <dbl>         <dbl>     <dbl>     <dbl>     <dbl>
#> 1          1    20             0         20000         30000     20000     30000     40000
#> 2          2    34             0             0         25000     25000     36000     40000
#> 3          3    29         28000         39000         42000     28000     39000     42000
#> 4          4    30             0             0         30000     30000     40000     50000
© www.soinside.com 2019 - 2024. All rights reserved.