在 R 中聚合、dcast 和创建新列

Question

我每一秒都有一个数据框。从间隔为 1 秒的数据框中，我设法使用以下代码将数据聚合为 1 分钟间隔：

agg_cont <- df %>% group_by(Date, Hour, Minute, Status, Mean) %>% count(name = 'Occurrence')

现在我有一个如下所示的数据框，

日期	分钟	状态	平均	发生
12/01/2022	00	a	20	60
12/02/2022	01	b	32	60
12/01/2022	02	a	21	60
12/02/2022	03	a	12	60
12/01/2022	04	a	23	20
12/01/2022	04	b	43	40
12/01/2022	05	a	33	60

请注意，“发生次数”列表示状态在特定分钟内发生的秒数。对于分钟 '04'，如果状态 'a' 和 'b' 的出现次数分别为 20 和 40，则状态 'a' 在特定分钟内出现 20 秒。

使用上面的数据框，我希望每分钟只有一行，并为每个“状态”和该特定分钟内发生的平均值创建新列。

期望的输出：

日期	分钟	a	b	意思是'a'	卑鄙的'b'
12/01/2022	00	60	0	20	NA
12/02/2022	01	0	60	32	NA
12/01/2022	02	60	0	21	NA
12/01/2022	03	0	60	12	NA
12/01/2022	04	20	40	23	43
12/02/2022	05	60	0	33	NA

我正在尝试使用 dcast 函数来获得所需的输出。

谢谢

Answer 1

这是一个

tidyverse

的方式。

agg_count <- read.table(text = "
Date    Hour    Minute  Status  Mean    Occurrence
12/01/2022  00  00  a   20  60
12/02/2022  00  01  b   32  60
12/01/2022  00  02  a   21  60
12/02/2022  00  03  a   12  60
12/01/2022  00  04  a   23  20
12/02/2022  00  04  b   43  40
12/02/2022  00  05  a   33  60
", header = TRUE)

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
})

agg_count %>%
  pivot_wider(
    id_cols = c(Date, Hour, Minute),
    names_from = Status,
    values_from = c(Occurrence, Mean),
    values_fill = 0L
  )
#> # A tibble: 7 × 7
#>   Date        Hour Minute Occurrence_a Occurrence_b Mean_a Mean_b
#>   <chr>      <int>  <int>        <int>        <int>  <int>  <int>
#> 1 12/01/2022     0      0           60            0     20      0
#> 2 12/02/2022     0      1            0           60      0     32
#> 3 12/01/2022     0      2           60            0     21      0
#> 4 12/02/2022     0      3           60            0     12      0
#> 5 12/01/2022     0      4           20            0     23      0
#> 6 12/02/2022     0      4            0           40      0     43
#> 7 12/02/2022     0      5           60            0     33      0

^{创建于 2023-03-27 与 reprex v2.0.2}

正如我在对问题的评论中所问，如果不考虑日期，那么下面的代码会将具有相同分钟数的输入行放在相同的输出行中。

agg_count %>%
  pivot_wider(
    id_cols = c(Hour, Minute),
    names_from = Status,
    values_from = c(Occurrence, Mean),
    values_fill = 0L
  )
#> # A tibble: 6 × 6
#>    Hour Minute Occurrence_a Occurrence_b Mean_a Mean_b
#>   <int>  <int>        <int>        <int>  <int>  <int>
#> 1     0      0           60            0     20      0
#> 2     0      1            0           60      0     32
#> 3     0      2           60            0     21      0
#> 4     0      3           60            0     12      0
#> 5     0      4           20           40     23     43
#> 6     0      5           60            0     33      0

^{创建于 2023-03-27 与 reprex v2.0.2}

Answer 2

更新的答案：

 agg_count %>%
   pivot_wider(
     id_cols = c(Date, Hour, Minute),
     names_from = Status,
     values_from = c(Occurrence, Mean),
     values_fn = mean,
     values_fill = 0L  
   )

在 R 中聚合、dcast 和创建新列

问题描述投票：0回答：2

2个回答

最新问题

在 R 中聚合、dcast 和创建新列

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2