为什么在使用 `tidyverse` 包中的 `pivot_wider()` 时会出现错误?

问题描述 投票:0回答:3

我有一个带有数值变量 (

Acc
) 和四个分类变量 (
ID
Datetime
Period
State
) 的数据框。

# My data
df <- data.frame(ID=c(rep(c("A"),8),rep(c("B"),8)),
                 Datetime=c("2020-08-05 12:00:00","2020-08-05 17:00:00","2020-08-05 18:03:00","2020-08-05 22:54:00","2020-08-06 01:08:00","2020-08-06 13:26:00","2020-08-06 19:04:00","2020-08-08 11:00:00",
                            "2020-08-04 03:00:00","2020-08-04 15:00:00","2020-08-04 23:00:00","2020-08-06 14:00:00","2020-08-06 17:00:00","2020-08-06 20:00:00","2020-08-07 04:00:00","2020-08-07 16:00:00"),
                 Period=c("Day","Day","Day","Night","Night","Day","Night","Day","Night","Day","Night","Day","Day","Night","Night","Day"),
                 State=c(1,2,1,1,1,1,2,2,1,1,1,2,2,1,1,1),
                 Acc=c(1.1,2.3,1.7,1.4,0.1,1.9,2.9,2.3,1.1,0.1,1.4,0.2,2.6,1.3,1.7,1.0))

df$Datetime <- as.POSIXct(df$Datetime,format="%Y-%m-%d %H:%M:%S", tz="UTC")

df

   ID            Datetime Period State Acc
1   A 2020-08-05 12:00:00    Day     1 1.1
2   A 2020-08-05 17:00:00    Day     2 2.3
3   A 2020-08-05 18:03:00    Day     1 1.7
4   A 2020-08-05 22:54:00  Night     1 1.4
5   A 2020-08-06 01:08:00  Night     1 0.1
6   A 2020-08-06 13:26:00    Day     1 1.9
7   A 2020-08-06 19:04:00  Night     2 2.9
8   A 2020-08-08 11:00:00    Day     2 2.3
9   B 2020-08-04 03:00:00  Night     1 1.1
10  B 2020-08-04 15:00:00    Day     1 0.1
11  B 2020-08-04 23:00:00  Night     1 1.4
12  B 2020-08-06 14:00:00    Day     2 0.2
13  B 2020-08-06 17:00:00    Day     2 2.6
14  B 2020-08-06 20:00:00  Night     1 1.3
15  B 2020-08-07 04:00:00  Night     1 1.7
16  B 2020-08-07 16:00:00    Day     1 1.0

我正在尝试估计每个

Acc
ID
Day
Period
的平均值
State
。为此,我尝试应用此代码:

library(tidyverse)

  df %>% 
    group_by(ID, Day = as.factor(as.Date(Datetime)), Period, State, .drop = FALSE) %>% 
    summarise(Acc = mean(Acc, na.rm = TRUE)) %>% 
    pivot_wider(names_from = c(State, Period),
                values_from = Acc,
                names_prefix = "State.") %>% 
    select(!State.NA_NA)

我应该得到这个:

#> `summarise()` regrouping output by 'ID', 'Day', 'Period' (override with `.groups` argument)
#> # A tibble: 10 x 6
#> # Groups:   ID, Day [10]
#>    ID    Day        State.1_Day State.2_Day State.1_Night State.2_Night
#>    <chr> <fct>            <dbl>       <dbl>         <dbl>         <dbl>
#>  1 A     2020-08-04        NA          NA           NA             NA  
#>  2 A     2020-08-05         1.4         2.3          1.4           NA  
#>  3 A     2020-08-06         1.9        NA            0.1            2.9
#>  4 A     2020-08-07        NA          NA           NA             NA  
#>  5 A     2020-08-08        NA           2.3         NA             NA  
#>  6 B     2020-08-04         0.1        NA            1.25          NA  
#>  7 B     2020-08-05        NA          NA           NA             NA  
#>  8 B     2020-08-06        NA           1.4          1.3           NA  
#>  9 B     2020-08-07         1          NA            1.7           NA  
#> 10 B     2020-08-08        NA          NA           NA             NA

但是,我收到消息

Error in map_lgl(.x, .p, ...) : object 'State' not found

我不明白错误在哪里。我想这很愚蠢,但我已经思考了好几个小时,但什么也没有。

有人能解释一下为什么我收到错误消息吗?

r tidyverse
3个回答
3
投票

这个有用吗:

df %>% 
  group_by(ID, Day = as.factor(as.Date(Datetime)), Period, State, .drop = FALSE) %>% 
  summarise(Acc = mean(Acc, na.rm = TRUE)) %>% na.omit() %>% 
  pivot_wider(names_from = c(State, Period),
              values_from = Acc,
              names_prefix = "State.") %>%  complete(ID, nesting(Day)) %>% arrange(ID, Day) %>% distinct()
`summarise()` regrouping output by 'ID', 'Day', 'Period' (override with `.groups` argument)
# A tibble: 10 x 6
# Groups:   ID, Day [10]
   ID    Day        State.1_Day State.2_Day State.1_Night State.2_Night
   <fct> <fct>            <dbl>       <dbl>         <dbl>         <dbl>
 1 A     2020-08-04        NA          NA           NA             NA  
 2 A     2020-08-05         1.4         2.3          1.4           NA  
 3 A     2020-08-06         1.9        NA            0.1            2.9
 4 A     2020-08-07        NA          NA           NA             NA  
 5 A     2020-08-08        NA           2.3         NA             NA  
 6 B     2020-08-04         0.1        NA            1.25          NA  
 7 B     2020-08-05        NA          NA           NA             NA  
 8 B     2020-08-06        NA           1.4          1.3           NA  
 9 B     2020-08-07         1          NA            1.7           NA  
10 B     2020-08-08        NA          NA           NA             NA  
> 

1
投票

我不确定你的代码中的错误来自哪里,但我认为它来自最后的

select
行。我没有选择任何不是 State.NA_NA 的内容,而是将其切换为保留其中不包含 NA 的所有列。这似乎得到了想要的输出。

另外查看评论,我建议使用语法

dplyr::select
调用每个 tidyverse 包,因为您可能会无意中调用不同的
select

library(tidyverse)

df <- data.frame(ID=c(rep(c("A"),8),rep(c("B"),8)),
                 Datetime=c("2020-08-05 12:00:00","2020-08-05 17:00:00","2020-08-05 18:03:00","2020-08-05 22:54:00","2020-08-06 01:08:00","2020-08-06 13:26:00","2020-08-06 19:04:00","2020-08-08 11:00:00",
                            "2020-08-04 03:00:00","2020-08-04 15:00:00","2020-08-04 23:00:00","2020-08-06 14:00:00","2020-08-06 17:00:00","2020-08-06 20:00:00","2020-08-07 04:00:00","2020-08-07 16:00:00"),
                 Period=c("Day","Day","Day","Night","Night","Day","Night","Day","Night","Day","Night","Day","Day","Night","Night","Day"),
                 State=c(1,2,1,1,1,1,2,2,1,1,1,2,2,1,1,1),
                 Acc=c(1.1,2.3,1.7,1.4,0.1,1.9,2.9,2.3,1.1,0.1,1.4,0.2,2.6,1.3,1.7,1.0))

df$Datetime <- as.POSIXct(df$Datetime,format="%Y-%m-%d %H:%M:%S", tz="UTC")

df
#>    ID            Datetime Period State Acc
#> 1   A 2020-08-05 12:00:00    Day     1 1.1
#> 2   A 2020-08-05 17:00:00    Day     2 2.3
#> 3   A 2020-08-05 18:03:00    Day     1 1.7
#> 4   A 2020-08-05 22:54:00  Night     1 1.4
#> 5   A 2020-08-06 01:08:00  Night     1 0.1
#> 6   A 2020-08-06 13:26:00    Day     1 1.9
#> 7   A 2020-08-06 19:04:00  Night     2 2.9
#> 8   A 2020-08-08 11:00:00    Day     2 2.3
#> 9   B 2020-08-04 03:00:00  Night     1 1.1
#> 10  B 2020-08-04 15:00:00    Day     1 0.1
#> 11  B 2020-08-04 23:00:00  Night     1 1.4
#> 12  B 2020-08-06 14:00:00    Day     2 0.2
#> 13  B 2020-08-06 17:00:00    Day     2 2.6
#> 14  B 2020-08-06 20:00:00  Night     1 1.3
#> 15  B 2020-08-07 04:00:00  Night     1 1.7
#> 16  B 2020-08-07 16:00:00    Day     1 1.0

df %>% 
  dplyr::group_by(ID, Day = as.factor(as.Date(Datetime)), Period, State, .drop = FALSE) %>% 
  dplyr::summarise(Acc = mean(Acc, na.rm = TRUE)) %>% 
  tidyr::pivot_wider(names_from = c(State, Period),
              values_from = Acc,
              names_prefix = "State.") %>% 
  dplyr::select(!contains("NA"))
#> `summarise()` regrouping output by 'ID', 'Day', 'Period' (override with `.groups` argument)
#> # A tibble: 10 x 6
#> # Groups:   ID, Day [10]
#>    ID    Day        State.1_Day State.2_Day State.1_Night State.2_Night
#>    <fct> <fct>            <dbl>       <dbl>         <dbl>         <dbl>
#>  1 A     2020-08-04        NA          NA           NA             NA  
#>  2 A     2020-08-05         1.4         2.3          1.4           NA  
#>  3 A     2020-08-06         1.9        NA            0.1            2.9
#>  4 A     2020-08-07        NA          NA           NA             NA  
#>  5 A     2020-08-08        NA           2.3         NA             NA  
#>  6 B     2020-08-04         0.1        NA            1.25          NA  
#>  7 B     2020-08-05        NA          NA           NA             NA  
#>  8 B     2020-08-06        NA           1.4          1.3           NA  
#>  9 B     2020-08-07         1          NA            1.7           NA  
#> 10 B     2020-08-08        NA          NA           NA             NA

reprex 包于 2020 年 11 月 11 日创建(v0.3.0)


0
投票

对于那些仍在寻找答案的人,我遇到了与名称_from 和值_from 变量类似的错误,并且可以通过将这些变量名称放在引号中来使其工作。不幸的是,我无法重现您的错误,所以我不确定它是否能解决您的问题,但为了后代的缘故,将其放在一起。这似乎是最近的变化。

© www.soinside.com 2019 - 2024. All rights reserved.