说,我有以下
# dummy data
df <- data.table(metric_1 = c(1,1,3)
, metric_2 = c(1,2,2)
); df
metric_1 metric_2
1: 1 1
2: 1 2
3: 3 2
我想在对每个计算(分组依据)列进行行计数之前,通过对每列执行计算(下面简化说明)来循环遍历 2 列(真实数据帧有许多其他列):
# metric columns
x <- c('metric_1', 'metric_2')
# list to capture results
y <- vector('list', length(x))
# summarise
for (i in seq_along(x))
{
y[[i]] <- df[, .(rows = .N)
, by = .(fifelse(get(x[[i]]) == 1, 0, get(x[[i]])))
]
}
上面的内容给出了汇总表的列表:
> y
[[1]]
fifelse rows
1: 0 2
2: 3 1
[[2]]
fifelse rows
1: 0 1
2: 2 2
但是,是否可以在循环内按列命名组?我使用
x[[i]]
尝试了以下操作:
for (i in seq_along(x))
{
y[[i]] <- df[, .(rows = .N)
, by = .(x[[i]] = fifelse(get(x[[i]]) == 1, 0, get(x[[i]])))
]
}
但出现错误:
Error: unexpected '=' in:
" df[, .(rows = .N)
, by = .(x[[i]] ="
鉴于数据量,
data.table
解决方案将不胜感激。
我建议采用不同的方法:
首先融化为长格式,然后计算新值,最后总结(和/或根据需要拆分为列表)
样本数据
library(data.table)
df <- data.table(metric_1 = c(1,1,3)
, metric_2 = c(1,2,2)
)
x <- c('metric_1', 'metric_2')
代码
# convert to long
df.long <- melt(df, measure.vars = x)
# perfom action to get desired values
df.long[, valueNew := fifelse(value == 1, 0, value)]
# summarise
df.long[, .N, by = .(variable, valueNew)]
# variable valueNew N
# 1: metric_1 0 2
# 2: metric_1 3 1
# 3: metric_2 0 1
# 4: metric_2 2 2
# if a list by x-column is needed
split(df.long[, .N, by = .(variable, valueNew)], by = "variable", keep.by = FALSE)
# $metric_1
# valueNew N
# 1: 0 2
# 2: 3 1
#
# $metric_2
# valueNew N
# 1: 0 1
# 2: 2 2
我们可以使用
setNames(list(..), x[[i]])
代替您的 .(..)
。
lapply(seq_along(x), function(i) {
df[, .(rows = .N) ,
by = setNames(list(fifelse(get(x[[i]]) == 1, 0, get(x[[i]]))), x[[i]]) ]
})
# [[1]]
# metric_1 rows
# <num> <int>
# 1: 0 2
# 2: 3 1
# [[2]]
# metric_2 rows
# <num> <int>
# 1: 0 1
# 2: 2 2