我正在尝试创建用于计算先前行中某个值的变量。因此对于第3行中的count_a,我需要计算第1~第3行中的“a”数。像这样我想创建count_a, count_b,count_c,cound_d,count_e
(如果var1
的唯一值是c(a,b,c,d,e)
)
数据:
var1 count_a count_b count_c ...
a 0 0 0
a 1 0 0
b 2 0 0
b 2 1 0
c 2 2 0
a 2 2 1
d 3 2 1
e 3 2 1
这是数据代码
我想在setDT(data)
中使用data.table
函数来实现此功能。
由于OP明确要求data.table
解决方案,这里有两种略有不同的方法。请注意,这些是PoGibas' sapply()
solution的替代实现):
library(data.table)
CJ(var1, unique(var1), sorted = FALSE)[
, cnt := cumsum(shift(V1, fill = "") == V2), by = V2][
, dcast(.SD, rowid(V2) ~ V2)][, V2 := var1][]
V2 a b c d e 1: a 0 0 0 0 0 2: a 1 0 0 0 0 3: b 2 0 0 0 0 4: b 2 1 0 0 0 5: c 2 2 0 0 0 6: a 2 2 1 0 0 7: d 3 2 1 0 0 8: e 3 2 1 1 0
CJ(unique(var1), var1, sorted = FALSE)[
, cnt := cumsum(V1 == shift(V2, fill = "")), by = rleid(V1)][
, dcast(.SD, rowid(V1) ~ V1)][, V1 := var1][]
V1 a b c d e
1: a 0 0 0 0 0
2: a 1 0 0 0 0
3: b 2 0 0 0 0
4: b 2 1 0 0 0
5: c 2 2 0 0 0
6: a 2 2 1 0 0
7: d 3 2 1 0 0
8: e 3 2 1 1 0
我也尝试应用this answer to another question of the OP中使用的方法,但需要大量抛光才能获得所需的结果,这里:
DT <- data.table(var1)
DT[, rn := .I][DT, on = .(rn < rn), by = .EACHI, .SD[, .(N = .N), by = var1]][
, dcast(.SD, rn ~ var1, fill = 0)][DT, on = "rn"]
rn a b c d NA var1 1: 1 0 0 0 0 1 a 2: 2 1 0 0 0 0 a 3: 3 2 0 0 0 0 b 4: 4 2 1 0 0 0 b 5: 5 2 2 0 0 0 c 6: 6 2 2 1 0 0 a 7: 7 3 2 1 0 0 d 8: 8 3 2 1 1 0 e
使用cumsum
的解决方案:
# OPs data
foo <- c("a", "a", "b", "b", "c", "a", "d", "e")
# Use cumsum to get cumulative sum
# Using dummy variable to get first count as 0
sapply(unique(foo), function(x) cumsum(c("dummy", foo) == x))
# a b c d e
# [1,] 0 0 0 0 0
# [2,] 1 0 0 0 0
# [3,] 2 0 0 0 0
# [4,] 2 1 0 0 0
# [5,] 2 2 0 0 0
# [6,] 2 2 1 0 0
# [7,] 3 2 1 0 0
# [8,] 3 2 1 1 0
# [9,] 3 2 1 1 1
# Use data.table to join everything (as wanted by OP)
library(data.table)
result <- data.table(foo,
sapply(unique(foo), function(x) cumsum(c("dummy", foo) == x)))
setnames(result, c("var1", paste0("count_", unique(foo))))
count_a = cumsum(var1 == "a")
count_a
[1] 1 2 2 2 2 3 3 3
这满足“第3行中的count_a,我需要计算第1~第3行中的”a“数,但这与您的示例中的数字不同。