自适应函数，根据常见的命名特征将函数应用于列的子集

Question

我要么咖啡因含量过多，要么咖啡因含量不足，因为我不知道该怎么做。我需要创建一个函数来计算一个方程，该方程对多组变量的截距和效果求幂，每组变量按列名称中的公共字符串分组，然后对所有指数求和，产生单个值。我需要在每行中跨列执行此操作，因此

dplyr

似乎是显而易见的选择。棘手的部分是该函数需要能够对每个集合中不同数量的元素执行此操作。展示比描述更容易。

这是两个数据集

set.seed(1)

names_df1 <- c("ball", "bell", "bat")
df1 <- data.frame(int_ball = sample(seq(-.99,-.01, .01),5,replace=T),
                  eff_ball = sample(seq(-.99,-.01, .01),5,replace=T),
                  int_bell = sample(seq(-.99,-.01, .01),5,replace=T),
                  eff_bell = sample(seq(-.99,-.01, .01),5,replace=T),
                  int_bat = sample(seq(-.99,-.01, .01),5,replace=T),
                  eff_bat = sample(seq(-.99,-.01, .01),5,replace=T))


names_df2 <- c("dog", "cat", "bird", "fish")
df2 <- data.frame(int_dog = sample(seq(-.99,-.01, .01),5,replace=T),
                  eff_dog = sample(seq(-.99,-.01, .01),5,replace=T),
                  int_cat = sample(seq(-.99,-.01, .01),5,replace=T),
                  eff_cat = sample(seq(-.99,-.01, .01),5,replace=T),
                  int_bird = sample(seq(-.99,-.01, .01),5,replace=T),
                  eff_bird = sample(seq(-.99,-.01, .01),5,replace=T),
                  int_fish = sample(seq(-.99,-.01, .01),5,replace=T),
                  eff_fish = sample(seq(-.99,-.01, .01),5,replace=T))

每个数据集具有与每个数据集之前的字符串向量中的元素一样多的变量对（

names_df1

和

names_df2

）。我需要将每对的

int_

和

eff_

变量加在一起，然后对结果求幂，然后将所有这些指数加在一起。对于数据集，我们三组对，结果将如下所示

df1 %>%
  mutate(eq_df1 = exp(int_ball + eff_ball) + exp(int_bell + eff_bell) + exp(int_bat + eff_bat))

#   int_ball eff_ball int_bell eff_bell int_bat eff_bat   eq_df1
# 1    -0.32    -0.57    -0.03    -0.93   -0.11   -0.21 1.519698
# 2    -0.61    -0.86    -0.15    -0.27   -0.63   -0.67 1.159504
# 3    -0.99    -0.18    -0.79    -0.21   -0.66   -0.16 1.118678
# 4    -0.66    -0.41    -0.46    -0.15   -0.11   -0.65 1.354026
# 5    -0.13    -0.49    -0.26    -0.63   -0.56   -0.30 1.371762

对于具有四组对的数据集，它看起来像这样

df2 %>%
  mutate(eq_df2 = exp(int_dog + eff_dog) + exp(int_cat + eff_cat) + exp(int_bird + eff_bird) + exp(int_fish + eff_fish))

#   int_dog eff_dog int_cat eff_cat int_bird eff_bird int_fish eff_fish   eq_df2
# 1   -0.26   -0.80   -0.56   -0.58    -0.98    -0.35    -0.19    -0.11 1.671570
# 2   -0.58   -0.56   -0.75   -0.94    -0.55    -0.30    -0.87    -0.77 1.125734
# 3   -0.62   -0.13   -0.30   -0.76    -0.82    -0.13    -0.60    -0.16 1.673230
# 4   -0.80   -0.30   -0.61   -0.68    -0.78    -0.30    -0.11    -0.71 1.388169
# 5   -0.72   -0.60   -0.49   -0.86    -0.22    -0.25    -0.52    -0.87 1.400453

非常感谢任何帮助。解决方案不必在 dplyr 中。

Answer 1

您可以定义将列转换为长格式的函数，执行所需的计算，然后绑定回原始数据：

library(dplyr)
library(tidyr)

f <- function(.data, vars = c(starts_with(c("eff_", "int_")))) {
  .data |> 
    select( {{ vars }} ) |> 
    rowid_to_column() |>
    pivot_longer(-rowid, names_sep = "_", names_to = c(".value", "name")) |> 
    summarise(eq_df1 = sum(exp(pick(2) + pick(3))), .by = rowid) |> 
    select(-rowid) |> 
    bind_cols(.data, results = _)
}

f(df1)
  int_ball eff_ball int_bell eff_bell int_bat eff_bat   eq_df1
1    -0.32    -0.57    -0.03    -0.93   -0.11   -0.21 1.519698
2    -0.61    -0.86    -0.15    -0.27   -0.63   -0.67 1.159504
3    -0.99    -0.18    -0.79    -0.21   -0.66   -0.16 1.118678
4    -0.66    -0.41    -0.46    -0.15   -0.11   -0.65 1.354026
5    -0.13    -0.49    -0.26    -0.63   -0.56   -0.30 1.371762

f(df2)
  int_dog eff_dog int_cat eff_cat int_bird eff_bird int_fish eff_fish   eq_df1
1   -0.26   -0.80   -0.56   -0.58    -0.98    -0.35    -0.19    -0.11 1.671570
2   -0.58   -0.56   -0.75   -0.94    -0.55    -0.30    -0.87    -0.77 1.125734
3   -0.62   -0.13   -0.30   -0.76    -0.82    -0.13    -0.60    -0.16 1.673230
4   -0.80   -0.30   -0.61   -0.68    -0.78    -0.30    -0.11    -0.71 1.388169
5   -0.72   -0.60   -0.49   -0.86    -0.22    -0.25    -0.52    -0.87 1.400453

自适应函数，根据常见的命名特征将函数应用于列的子集

问题描述投票：0回答：1

1个回答

最新问题

自适应函数，根据常见的命名特征将函数应用于列的子集

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1