在 dplyr mutate 中使用自定义 case_when 函数

问题描述 投票:0回答:1

我已经查看了许多与我的问题相关的帖子,但我似乎无法弄清楚我的问题。

我有一个基本表格,随着收集(NFL 赛季)的继续,该表格将包含额外的列。我无法使用基于周的 case_when 来运行我的函数而不返回“!对象'WK1'未找到”

第二周后我的数据。

NumCorrect <- data.frame(
  TEAM = c(A:E),
  WK1 = c(11,12,11,12,13),
  WK2 = c(7,7,9,10,7)
)

我的代码。

# Defined variable at the top of the code that I change as the weeks are played.
curWEEK = 2

# Simplified version of my dplyr code at the step that won't work. 
correct_d <- mutate(AVG_16 = average16(curWEEK))

我的功能。

average16 <- function(x) {case_when(x == 1 ~ WK1,
                                    x == 2 ~ round(mean((WK1:WK2), na.rm=TRUE),1),
                                    x == 3 ~ round(mean((WK1:WK3), na.rm=TRUE),1),
                                    x %in% 4:7 ~ round(mean((WK1:WK4), na.rm=TRUE),1),
                                    x %in% 8:11 ~ round(mean(c(WK1:WK4,WK8), na.rm=TRUE),1),
                                    x %in% 12:14 ~ round(mean(c(WK1:WK4,WK8,WK12), na.rm=TRUE),1),
                                    x == 15 ~ round(mean(c(WK1:WK4,WK8,WK12,WK15), na.rm=TRUE),1),
                                    x == 16 ~ round(mean(c(WK1:WK4,WK8,WK12,WK15:WK16), na.rm=TRUE),1),
                                    x == 17 ~ round(mean(c(WK1:WK4,WK8,WK12,WK15:WK17), na.rm=TRUE),1),
                                    x == 18 ~ round(mean(c(WK1:WK4,WK8,WK12,WK15:WK18), na.rm=TRUE),1)
                                    )}

我尝试让 R 和 chatGPT 决一死战,但功能变得越来越复杂,没有解决方案。

如何使用函数来查找不完整且将添加列的表中仅某些列的平均值?

我尝试了很多版本,并且能够修改代码,将所有未玩的周列(WK1 到 WK18)保留为 NA,但仍然收到“找不到对象”错误。

每次都会出错:

Error in `mutate()`:
ℹ In argument: `AVG_16
  = average16(curWEEK)`.
ℹ In row 1.
Caused by error in `case_when()`:
! Failed to evaluate
  the right-hand side of
  formula 1.
Caused by error:
! object 'WK1' not found

我也尝试过这个:

average16 <- function(x) {if(x == 1) {WK1}
                          if(x == 2) {round(mean((WK1:WK2), na.rm=TRUE),1)}
                          if(x == 3) {round(mean((WK1:WK3), na.rm=TRUE),1)}
                          if(x %in% 4:7)   {round(mean((WK1:WK4), na.rm=TRUE),1)}
                          if(x %in% 8:11)  {round(mean(c(WK1:WK4,WK8), na.rm=TRUE),1)}
                          if(x %in% 12:14) {round(mean(c(WK1:WK4,WK8,WK12), na.rm=TRUE),1)}
                          if(x == 15) {round(mean(c(WK1:WK4,WK8,WK12,WK15), na.rm=TRUE),1)}
                          if(x == 16) {round(mean(c(WK1:WK4,WK8,WK12,WK15:WK16), na.rm=TRUE),1)}
                          if(x == 17) {round(mean(c(WK1:WK4,WK8,WK12,WK15:WK17), na.rm=TRUE),1)}
                          if(x == 18) {round(mean(c(WK1:WK4,WK8,WK12,WK15:WK18), na.rm=TRUE),1)}
                          }
r dplyr case mutate
1个回答
0
投票

如果您正在使用

dplyr
,那么也许一种非常不同的方法可能会实现您的目标。

您可以使用字符向量来指定要取平均值的列(如果存在)。 如果它们不存在,那么

any_of
将默默地忽略它们。

library(tidyverse)
NumCorrect <- data.frame(
    TEAM = LETTERS[1:5],
    WK1 = c(11,12,11,12,13),
    WK2 = c(7,7,9,10,7)
)

cols_to_avg <- paste0("WK", c(1:4, 8, 12, 15:18))

NumCorrect |> 
    rowwise() |> 
    mutate(avg16 = round(mean(c_across(any_of(cols_to_avg))),1)) |> 
    ungroup()
© www.soinside.com 2019 - 2024. All rights reserved.