在 R 中的数据集上应用复杂函数和计算的命令

Question

我是一位相当有经验的 R 用户，经常很难使用 apply 系列。我有非常缓慢的迭代代码，我希望通过使用这个系列来提高其性能，但遇到了困难。我将在这里大大简化用例，因此请假设没有明显的解决方法。

我有一个数据集，其中包含分配给 5 个可能组的 4 个观测值（实际用例是 50,000 个观测值，1110 个可能组）和两个输出变量。我想通过分配对每个观察进行分组，然后对输出进行一些操作（这里，为了简化，我会说每个观察的均方和。实际输出要复杂得多）。我的迭代方法给了我我想要的，看起来像这样：

library(tidyverse)
set.seed(8675309)

#create toy data
dataset <- data.frame(obs_1 = round(runif(100, 1, 5)),
                      obs_2 = round(runif(100, 1, 5)),
                      obs_3 = round(runif(100, 1, 5)),
                      obs_4 = round(runif(100, 1, 5)),
                      val_1 = rnorm(100, 0, 5),
                      val_2 = rnorm(100, 0, 15))

#define a function to create the output for each group
cals <- function(df){
  var <- df %>%
    group_by(group) %>%
    summarise(x1 = sum(val_1),
              x2 = sum(val_2)) %>%
    mutate(x1 = x1^2,
           x2 = x2^2) %>%
    mutate(ans = x1  + x2) %>%
    pull(ans)
  return(var)
}

#initialize output matrix
answer <- matrix(rep(NA, 20), 5)

#loops -- ugh
for(i in 1:4){
#pull each group list and the two output variables
  df_used <- dataset %>%
    select(i, val1, val2)

#give the group list a common name so the function can identify it
  names(df_used)[1] <- 'group'

#calculate output using the function
  cal <- cals(df_used)

#write this into the output matrix
  answer[, i] <- cal
}

answer
# Result:
          [,1]        [,2]       [,3]       [,4]
[1,]  1159.463  197.090174   302.4915   320.8285
[2,] 15820.498 1975.668791   294.3433  7070.0387
[3,]  2423.859  537.334344 13256.3443  1331.7600
[4,]  4646.915 1900.430230  1836.5904 17242.5160
[5,]  9403.906    4.785014  1449.9531  1588.6278

不过，我认为一定有一种更快、不那么难看的方法（？）

Answer 1

mapply

可能就是您所追求的。这是一个

data.table

版本：

dt <- as.data.table(dataset)
mapply(\(x) setorder(dt[,.(sum(val_1)^2 + sum(val_2)^2), x], x)[[2]], dt[,1:4])
#>           obs_1     obs_2     obs_3     obs_4
#> [1,]   524.9378  1220.855 1780.1158  786.5803
#> [2,]  2890.6006 10847.766 6224.3217 7760.9268
#> [3,] 18436.0742  2667.610 3879.1027  466.2114
#> [4,]  6888.7064  1774.418 2644.9105 1149.2653
#> [5,]  3169.8326  3691.997  676.0297 2821.5822

在 R 中的数据集上应用复杂函数和计算的命令

问题描述投票：0回答：1

1个回答

最新问题

在 R 中的数据集上应用复杂函数和计算的命令

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1