如何根据数据集中某列的值将数据行分成不同的组并计算每组的总时间?

问题描述 投票:0回答:1

我有遥测数据,其中列出了个人在该区域内的检测结果。

对于每个会话,我想计算个人处于该区域的总时间。我假设检测发生时此人就在该区域。

我想根据“id56wtimelag”列的值是否大于 86,400 将数据行拆分为单独的会话。

然后,我希望能够计算该人在这里的会话数量和每个会话的持续时间。左列仅包含检测列表。 Channel..T、ag.ID、天线和功率列可以忽略。

      Channel..T ag.ID Antenna Power                dat2 id56wtimelag
9              7    56      A0   206 2022-12-17 16:03:18      NA secs
11             7    56      A0   184 2022-12-17 16:03:31      13 secs
12             7    56      A0   182 2022-12-17 16:03:35       4 secs
13             7    56      A0   180 2022-12-17 16:03:39       4 secs
15             7    56      A0   206 2022-12-17 16:03:55      16 secs
16             7    56      A0   206 2022-12-17 16:03:59       4 secs
19             7    56      A0   169 2022-12-17 16:05:37      98 secs
20             7    56      A0   173 2022-12-17 16:05:41       4 secs
21             7    56      A0   187 2022-12-17 16:05:45       4 secs
17729          7    56      A0   100 2023-01-04 12:42:53 1543028 secs
17730          7    56      A0   103 2023-01-04 12:42:57       4 secs
17731          7    56      A0   118 2023-01-04 12:43:01       4 secs
17732          7    56      A0   103 2023-01-04 12:43:13      12 secs
17733          7    56      A0   102 2023-01-04 12:43:17       4 secs
17734          7    56      A0    96 2023-01-04 12:43:21       4 secs
17738          7    56      A0   106 2023-01-04 12:43:36      15 secs
17739          7    56      A0   108 2023-01-04 12:43:40       4 secs
17742          7    56      A0   111 2023-01-04 12:43:57      17 secs
17743          7    56      A0    95 2023-01-04 12:44:01       4 secs
17744          7    56      A0   101 2023-01-04 12:44:05       4 secs
17748          7    56      A0   106 2023-01-04 12:44:17      12 secs
17749          7    56      A0   105 2023-01-04 12:44:21       4 secs
17750          7    56      A0   105 2023-01-04 12:44:25       4 secs
17753          7    56      A0   103 2023-01-04 12:44:37      12 secs
17754          7    56      A0   100 2023-01-04 12:44:41       4 secs
17755          7    56      A0   103 2023-01-04 12:44:45       4 secs
17759          7    56      A0    96 2023-01-04 12:44:58      13 secs
17760          7    56      A0    93 2023-01-04 12:45:08      10 secs
17763          7    56      A0    95 2023-01-04 12:45:28      20 secs
17765          7    56      A0    86 2023-01-04 12:45:48      20 secs
17767          7    56      A0   103 2023-01-04 12:46:08      20 secs
17769          7    56      A0    85 2023-01-04 12:46:28      20 secs
17772          7    56      A0    89 2023-01-04 12:46:48      20 secs
17774          7    56      A0   102 2023-01-04 12:47:08      20 secs
17776          7    56      A0   109 2023-01-04 12:47:28      20 secs
17777          7    56      A0   103 2023-01-04 12:47:48      20 secs
17778          7    56      A0   102 2023-01-04 12:48:08      20 secs
17779          7    56      A0   100 2023-01-04 12:48:28      20 secs
17780          7    56      A0   107 2023-01-04 12:48:38      10 secs
17781          7    56      A0   102 2023-01-04 12:48:58      20 secs
17782          7    56      A0   100 2023-01-04 12:49:18      20 secs
17783          7    56      A0    94 2023-01-04 12:49:38      20 secs

当“id56wtimelag”列的值大于 86,400 时,我无法根据条件将行拆分为单独的组/会话。然后我需要计算每个会话的持续时间,最好包含在该数据集的类似列表中:

`Session  Length of each session (seconds)
1  147
2  405`

数据

数据集采用

dput
格式。

df1 <-
  structure(list(
    Channel..T = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
                   7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
                   7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
                   7L, 7L), 
    ag.ID = c(56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 
              56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 
              56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 
              56L, 56L, 56L, 56L, 56L, 56L, 56L), 
    Antenna = c("A0", "A0", "A0", 
                "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", 
                "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", 
                "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", 
                "A0", "A0", "A0", "A0", "A0", "A0"), 
    Power = c(206L, 184L, 182L, 
              180L, 206L, 206L, 169L, 173L, 187L, 100L, 103L, 118L, 103L, 102L, 
              96L, 106L, 108L, 111L, 95L, 101L, 106L, 105L, 105L, 103L, 100L, 
              103L, 96L, 93L, 95L, 86L, 103L, 85L, 89L, 102L, 109L, 103L, 102L, 
              100L, 107L, 102L, 100L, 94L), 
    dat2 = c("2022-12-17 16:03:18", 
             "2022-12-17 16:03:31", "2022-12-17 16:03:35", "2022-12-17 16:03:39", 
             "2022-12-17 16:03:55", "2022-12-17 16:03:59", "2022-12-17 16:05:37", 
             "2022-12-17 16:05:41", "2022-12-17 16:05:45", "2023-01-04 12:42:53", 
             "2023-01-04 12:42:57", "2023-01-04 12:43:01", "2023-01-04 12:43:13", 
             "2023-01-04 12:43:17", "2023-01-04 12:43:21", "2023-01-04 12:43:36", 
             "2023-01-04 12:43:40", "2023-01-04 12:43:57", "2023-01-04 12:44:01", 
             "2023-01-04 12:44:05", "2023-01-04 12:44:17", "2023-01-04 12:44:21", 
             "2023-01-04 12:44:25", "2023-01-04 12:44:37", "2023-01-04 12:44:41", 
             "2023-01-04 12:44:45", "2023-01-04 12:44:58", "2023-01-04 12:45:08", 
             "2023-01-04 12:45:28", "2023-01-04 12:45:48", "2023-01-04 12:46:08", 
             "2023-01-04 12:46:28", "2023-01-04 12:46:48", "2023-01-04 12:47:08", 
             "2023-01-04 12:47:28", "2023-01-04 12:47:48", "2023-01-04 12:48:08", 
             "2023-01-04 12:48:28", "2023-01-04 12:48:38", "2023-01-04 12:48:58", 
             "2023-01-04 12:49:18", "2023-01-04 12:49:38"), 
    id56wtimelag = c("NA secs", 
                     "13 secs", "4 secs", "4 secs", "16 secs", "4 secs", "98 secs", 
                     "4 secs", "4 secs", "1543028 secs", "4 secs", "4 secs", "12 secs", 
                     "4 secs", "4 secs", "15 secs", "4 secs", "17 secs", "4 secs", 
                     "4 secs", "12 secs", "4 secs", "4 secs", "12 secs", "4 secs", 
                     "4 secs", "13 secs", "10 secs", "20 secs", "20 secs", "20 secs", 
                     "20 secs", "20 secs", "20 secs", "20 secs", "20 secs", "20 secs", 
                     "20 secs", "10 secs", "20 secs", "20 secs", "20 secs")), 
    row.names = c("9", "11", "12", "13", "15", "16", "19", "20", "21", 
                  "17729", "17730", "17731", "17732", "17733", "17734", "17738", "17739", 
                  "17742", "17743", "17744", "17748", "17749", "17750", "17753", "17754", 
                  "17755", "17759", "17760", "17763", "17765", "17767", "17769", 
                  "17772", "17774", "17776", "17777", "17778", "17779", "17780", 
                  "17781", "17782", "17783"), class = "data.frame")
r subset addition cumulative-sum tibble
1个回答
0
投票

这是一个基本的 R 解决方案。

  1. 删除字符串
    secs
    持续时间列
    id56wtimelag
    并将其强制为整数;
  2. 获取一个逻辑向量,其中
    id56wtimelag
    大于或等于86,400;
  3. 创建
    Length
    ,该列的副本,我们需要它,因为它包含
    NA
    ,并且通过处理副本,原始内容保持不变;
  4. Length
    >= 86,400 的地方分配零;
  5. FALSE
    NA
    中的所有
    Session
  6. 现在是会话数。标准的
    cumsum
    技巧将使每次
    Session
    TRUE
    时向量都会增加。

然后使用

aggregate
计算会话的持续时间。

df1$id56wtimelag <- gsub("[[:alpha:]]*|[[:space:]]*", "", df1$id56wtimelag) |>
  as.integer()

Session <- df1$id56wtimelag >= 86400L
Length <- df1$id56wtimelag
Length[Session] <- 0L
Session[is.na(Session)] <- FALSE
Session <- cumsum(Session) + 1L
aggregate(Length ~ Session, FUN = sum)
#>   Session Length
#> 1       1    147
#> 2       2    405

创建于 2024-03-11,使用 reprex v2.1.0

© www.soinside.com 2019 - 2024. All rights reserved.