计算具有特定条件的值

问题描述 投票:-1回答:3

我正在尝试创建用于计算先前行中某个值的变量。因此对于第3行中的count_a,我需要计算第1~第3行中的“a”数。像这样我想创建count_a, count_b,count_c,cound_d,count_e(如果var1的唯一值是c(a,b,c,d,e)

数据:

var1     count_a     count_b     count_c ...
  a          0          0          0
  a          1          0          0
  b          2          0          0
  b          2          1          0
  c          2          2          0
  a          2          2          1
  d          3          2          1
  e          3          2          1

这是数据代码

我想在setDT(data)中使用data.table函数来实现此功能。

r data.table
3个回答
1
投票

由于OP明确要求data.table解决方案,这里有两种略有不同的方法。请注意,这些是PoGibas' sapply() solution的替代实现):

library(data.table)
CJ(var1, unique(var1), sorted = FALSE)[
  , cnt := cumsum(shift(V1, fill = "") == V2), by = V2][
    , dcast(.SD, rowid(V2) ~ V2)][, V2 := var1][]
   V2 a b c d e
1:  a 0 0 0 0 0
2:  a 1 0 0 0 0
3:  b 2 0 0 0 0
4:  b 2 1 0 0 0
5:  c 2 2 0 0 0
6:  a 2 2 1 0 0
7:  d 3 2 1 0 0
8:  e 3 2 1 1 0
CJ(unique(var1), var1, sorted = FALSE)[
  , cnt := cumsum(V1 == shift(V2, fill = "")), by = rleid(V1)][
    , dcast(.SD, rowid(V1) ~ V1)][, V1 := var1][]


   V1 a b c d e
1:  a 0 0 0 0 0
2:  a 1 0 0 0 0
3:  b 2 0 0 0 0
4:  b 2 1 0 0 0
5:  c 2 2 0 0 0
6:  a 2 2 1 0 0
7:  d 3 2 1 0 0
8:  e 3 2 1 1 0

我也尝试应用this answer to another question of the OP中使用的方法,但需要大量抛光才能获得所需的结果,这里:

DT <- data.table(var1)
DT[, rn := .I][DT, on = .(rn < rn), by = .EACHI, .SD[, .(N = .N), by = var1]][
  , dcast(.SD, rn ~ var1, fill = 0)][DT, on = "rn"]
   rn a b c d NA var1
1:  1 0 0 0 0  1    a
2:  2 1 0 0 0  0    a
3:  3 2 0 0 0  0    b
4:  4 2 1 0 0  0    b
5:  5 2 2 0 0  0    c
6:  6 2 2 1 0  0    a
7:  7 3 2 1 0  0    d
8:  8 3 2 1 1  0    e

1
投票

使用cumsum的解决方案:

# OPs data
foo <- c("a", "a", "b", "b", "c", "a", "d", "e")

# Use cumsum to get cumulative sum
# Using dummy variable to get first count as 0
sapply(unique(foo), function(x) cumsum(c("dummy", foo) == x))
#      a b c d e
# [1,] 0 0 0 0 0
# [2,] 1 0 0 0 0
# [3,] 2 0 0 0 0
# [4,] 2 1 0 0 0
# [5,] 2 2 0 0 0
# [6,] 2 2 1 0 0
# [7,] 3 2 1 0 0
# [8,] 3 2 1 1 0
# [9,] 3 2 1 1 1

# Use data.table to join everything (as wanted by OP)
library(data.table)
result <- data.table(foo, 
                     sapply(unique(foo), function(x) cumsum(c("dummy", foo) == x)))
setnames(result, c("var1", paste0("count_", unique(foo))))

0
投票
count_a = cumsum(var1 == "a")
count_a
  [1] 1 2 2 2 2 3 3 3

这满足“第3行中的count_a,我需要计算第1~第3行中的”a“数,但这与您的示例中的数字不同。

© www.soinside.com 2019 - 2024. All rights reserved.