创建计数(带重置)变量

问题描述 投票:0回答:1

我有一个数据集,其中有事件发生的二进制指示器。从这个列表中,我想创建一个没有事件发生的连续时间步数的计数。举个例子(TS = 时间步长,EV = 事件指示器,C = 计数):

TS1 -> TS2 -> TS3 -> TS4 -> TS5 ->... 

EV0 -> EV0 -> EV1 -> EV0 -> EV0 ->... 

C0  -> C1  -> C0  -> C0  -> C1  ->... 

作为示例数据框,请考虑:

labs <- c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C", "D", "D", "D", "D", "D")
time <- c(1,2,3,4 ,1,2,3,4 ,1,2,3,4 ,1,2,3,4,5)
event <- c(0,0,0,0, 0,1,0,0, 1,1,0,0, NA,0,0,1,0)
desiredOutcome <- c(0,1,2,3,0,0,0,1,0,0,0,1,NA,0,1,0,0) # goal

exDF <- data.frame(labs,time, event, desiredOutcome)

根据最终目标和数据框,我最终得到了以下代码:

library(dplyr)

exDF <- exDF %>%
  group_by(labs) %>%
  mutate(pe1 = lag(event, order_by=time)) # create new variable for prior event


exDF$count2 <- ifelse(
  ((exDF$pe1 == 1) & (exDF$event == 0)), # condition checks for rows where previous timestep is included & had event WHERE event is not ongoing in this timestep
  0, # True val
  NA) # False val


exDF$count <- ifelse(
  (is.na(exDF$pe1) & (exDF$event == 0)), # condition checks for rows where previous timestep is not included & no current event
  0, # True val
  exDF$count2) # False val

似乎正确填写了所有零。但是,我不知道有什么好方法可以从填充适当的 0 和其他带有 NA 的值得到我想要的结果。

我的大部分实验都与组合 mutate 和 lag 相关,但它们只会导致填充下一组值(如果零位于输入列中,则单独显示 1;如果是 1,则显示 2)。以下示例不会尝试处理计数重置,但会导致上述行为:

exDF <- exDF %>%
  group_by(labs) %>%
  mutate(countFinal = lag(count, order_by=time) + 1) 

所以,我的挑战与事情解决的顺序有关。对于这里的 mutate 命令,顺序似乎是:

Pull all cell values by label -> Look at their lags -> Add 1 -> Done, but incorrectly

当我需要它时:

Pull first cell value by label -> Look at lag -> Add 1 or reset -> Pull second cell (filled in prior step) value by label -> Look at their lags -> Add 1 or reset -> Pull third... -> Done

有没有一个好的方法可以使用现有的包来做到这一点?

r dplyr time-series
1个回答
0
投票

想不出更直接的方法,但这可行。工作流程:

  1. 创建事件 (tmp) 的副本并将 NA 替换为唯一值,例如2
  2. 给每个事件分组一个唯一的ID
  3. replace()
    每个组中的第一个值为零,并将剩余的非零组 ID 更改为 1
  4. 返回tmp列的累计和
  5. desiredOutcome 列中应为 NA 的正确值
library(dplyr)

exDF |>
  group_by(labs) %>%
  mutate(tmp = if_else(is.na(event), 2, event),
         tmp = cumsum(tmp != lag(tmp, default = 1))) |>
  group_by(labs, tmp) |>
  mutate(tmp = replace(tmp, 1, 0),
         tmp = if_else(tmp != 0, 1, 0),
         tmp = cumsum(tmp),
         desiredOutcome = if_else(is.na(event), NA, desiredOutcome)) |>
  ungroup() |>
  select(-tmp)
         
# # A tibble: 17 × 4
#    labs   time event desiredOutcome
#    <chr> <dbl> <dbl>          <dbl>
#  1 A         1     0              0
#  2 A         2     0              1
#  3 A         3     0              2
#  4 A         4     0              3
#  5 B         1     0              0
#  6 B         2     1              0
#  7 B         3     0              0
#  8 B         4     0              1
#  9 C         1     1              0
# 10 C         2     1              0
# 11 C         3     0              0
# 12 C         4     0              1
# 13 D         1    NA             NA
# 14 D         2     0              0
# 15 D         3     0              1
# 16 D         4     1              0
# 17 D         5     0              0   
© www.soinside.com 2019 - 2024. All rights reserved.