动态 AND 计算分组依据

问题描述 投票:0回答:2

说,我有以下

# dummy data
df <- data.table(metric_1 = c(1,1,3)
                 , metric_2 = c(1,2,2)
                 ); df

   metric_1 metric_2
1:        1        1
2:        1        2
3:        3        2

我想在对每个计算(分组依据)列进行行计数之前,通过对每列执行计算(下面简化说明)来循环遍历 2 列(真实数据帧有许多其他列):

# metric columns
x <- c('metric_1', 'metric_2')

# list to capture results
y <- vector('list', length(x))

# summarise
for (i in seq_along(x))
{
  y[[i]] <- df[, .(rows = .N)
               , by = .(fifelse(get(x[[i]]) == 1, 0, get(x[[i]])))
               ]
}

上面的内容给出了汇总表的列表:

> y
[[1]]
   fifelse rows
1:       0    2
2:       3    1

[[2]]
   fifelse rows
1:       0    1
2:       2    2

但是,是否可以在循环内按列命名组?我使用

x[[i]]
尝试了以下操作:

for (i in seq_along(x))
{
  y[[i]] <- df[, .(rows = .N)
               , by = .(x[[i]] = fifelse(get(x[[i]]) == 1, 0, get(x[[i]])))
               ]
}

但出现错误:

Error: unexpected '=' in:
"    df[, .(rows = .N)
       , by = .(x[[i]] ="

鉴于数据量,

data.table
解决方案将不胜感激。

r data.table
2个回答
2
投票

我建议采用不同的方法:

首先融化为长格式,然后计算新值,最后总结(和/或根据需要拆分为列表)

样本数据

library(data.table)
df <- data.table(metric_1 = c(1,1,3)
                 , metric_2 = c(1,2,2)
)

x <- c('metric_1', 'metric_2')

代码

# convert to long
df.long <- melt(df, measure.vars = x)
# perfom action to get desired values
df.long[, valueNew := fifelse(value == 1, 0, value)]
# summarise
df.long[, .N, by = .(variable, valueNew)]
#    variable valueNew N
# 1: metric_1        0 2
# 2: metric_1        3 1
# 3: metric_2        0 1
# 4: metric_2        2 2

# if a list by x-column is needed
split(df.long[, .N, by = .(variable, valueNew)], by = "variable", keep.by = FALSE)
# $metric_1
#    valueNew N
# 1:        0 2
# 2:        3 1
# 
# $metric_2
#    valueNew N
# 1:        0 1
# 2:        2 2

2
投票

我们可以使用

setNames(list(..), x[[i]])
代替您的
.(..)

lapply(seq_along(x), function(i) {
  df[, .(rows = .N) ,
     by = setNames(list(fifelse(get(x[[i]]) == 1, 0, get(x[[i]]))), x[[i]]) ]
})
# [[1]]
#    metric_1  rows
#       <num> <int>
# 1:        0     2
# 2:        3     1
# [[2]]
#    metric_2  rows
#       <num> <int>
# 1:        0     1
# 2:        2     2
© www.soinside.com 2019 - 2024. All rights reserved.