我有一个带有数值变量 (
Acc
) 和四个分类变量 (ID
、Datetime
、Period
和 State
) 的数据框。
# My data
df <- data.frame(ID=c(rep(c("A"),8),rep(c("B"),8)),
Datetime=c("2020-08-05 12:00:00","2020-08-05 17:00:00","2020-08-05 18:03:00","2020-08-05 22:54:00","2020-08-06 01:08:00","2020-08-06 13:26:00","2020-08-06 19:04:00","2020-08-08 11:00:00",
"2020-08-04 03:00:00","2020-08-04 15:00:00","2020-08-04 23:00:00","2020-08-06 14:00:00","2020-08-06 17:00:00","2020-08-06 20:00:00","2020-08-07 04:00:00","2020-08-07 16:00:00"),
Period=c("Day","Day","Day","Night","Night","Day","Night","Day","Night","Day","Night","Day","Day","Night","Night","Day"),
State=c(1,2,1,1,1,1,2,2,1,1,1,2,2,1,1,1),
Acc=c(1.1,2.3,1.7,1.4,0.1,1.9,2.9,2.3,1.1,0.1,1.4,0.2,2.6,1.3,1.7,1.0))
df$Datetime <- as.POSIXct(df$Datetime,format="%Y-%m-%d %H:%M:%S", tz="UTC")
df
ID Datetime Period State Acc
1 A 2020-08-05 12:00:00 Day 1 1.1
2 A 2020-08-05 17:00:00 Day 2 2.3
3 A 2020-08-05 18:03:00 Day 1 1.7
4 A 2020-08-05 22:54:00 Night 1 1.4
5 A 2020-08-06 01:08:00 Night 1 0.1
6 A 2020-08-06 13:26:00 Day 1 1.9
7 A 2020-08-06 19:04:00 Night 2 2.9
8 A 2020-08-08 11:00:00 Day 2 2.3
9 B 2020-08-04 03:00:00 Night 1 1.1
10 B 2020-08-04 15:00:00 Day 1 0.1
11 B 2020-08-04 23:00:00 Night 1 1.4
12 B 2020-08-06 14:00:00 Day 2 0.2
13 B 2020-08-06 17:00:00 Day 2 2.6
14 B 2020-08-06 20:00:00 Night 1 1.3
15 B 2020-08-07 04:00:00 Night 1 1.7
16 B 2020-08-07 16:00:00 Day 1 1.0
我正在尝试估计每个
Acc
、ID
、Day
和 Period
的平均值 State
。为此,我尝试应用此代码:
library(tidyverse)
df %>%
group_by(ID, Day = as.factor(as.Date(Datetime)), Period, State, .drop = FALSE) %>%
summarise(Acc = mean(Acc, na.rm = TRUE)) %>%
pivot_wider(names_from = c(State, Period),
values_from = Acc,
names_prefix = "State.") %>%
select(!State.NA_NA)
我应该得到这个:
#> `summarise()` regrouping output by 'ID', 'Day', 'Period' (override with `.groups` argument)
#> # A tibble: 10 x 6
#> # Groups: ID, Day [10]
#> ID Day State.1_Day State.2_Day State.1_Night State.2_Night
#> <chr> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 A 2020-08-04 NA NA NA NA
#> 2 A 2020-08-05 1.4 2.3 1.4 NA
#> 3 A 2020-08-06 1.9 NA 0.1 2.9
#> 4 A 2020-08-07 NA NA NA NA
#> 5 A 2020-08-08 NA 2.3 NA NA
#> 6 B 2020-08-04 0.1 NA 1.25 NA
#> 7 B 2020-08-05 NA NA NA NA
#> 8 B 2020-08-06 NA 1.4 1.3 NA
#> 9 B 2020-08-07 1 NA 1.7 NA
#> 10 B 2020-08-08 NA NA NA NA
但是,我收到消息
Error in map_lgl(.x, .p, ...) : object 'State' not found
。
我不明白错误在哪里。我想这很愚蠢,但我已经思考了好几个小时,但什么也没有。
有人能解释一下为什么我收到错误消息吗?
这个有用吗:
df %>%
group_by(ID, Day = as.factor(as.Date(Datetime)), Period, State, .drop = FALSE) %>%
summarise(Acc = mean(Acc, na.rm = TRUE)) %>% na.omit() %>%
pivot_wider(names_from = c(State, Period),
values_from = Acc,
names_prefix = "State.") %>% complete(ID, nesting(Day)) %>% arrange(ID, Day) %>% distinct()
`summarise()` regrouping output by 'ID', 'Day', 'Period' (override with `.groups` argument)
# A tibble: 10 x 6
# Groups: ID, Day [10]
ID Day State.1_Day State.2_Day State.1_Night State.2_Night
<fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 A 2020-08-04 NA NA NA NA
2 A 2020-08-05 1.4 2.3 1.4 NA
3 A 2020-08-06 1.9 NA 0.1 2.9
4 A 2020-08-07 NA NA NA NA
5 A 2020-08-08 NA 2.3 NA NA
6 B 2020-08-04 0.1 NA 1.25 NA
7 B 2020-08-05 NA NA NA NA
8 B 2020-08-06 NA 1.4 1.3 NA
9 B 2020-08-07 1 NA 1.7 NA
10 B 2020-08-08 NA NA NA NA
>
我不确定你的代码中的错误来自哪里,但我认为它来自最后的
select
行。我没有选择任何不是 State.NA_NA 的内容,而是将其切换为保留其中不包含 NA 的所有列。这似乎得到了想要的输出。
另外查看评论,我建议使用语法
dplyr::select
调用每个 tidyverse 包,因为您可能会无意中调用不同的 select
。
library(tidyverse)
df <- data.frame(ID=c(rep(c("A"),8),rep(c("B"),8)),
Datetime=c("2020-08-05 12:00:00","2020-08-05 17:00:00","2020-08-05 18:03:00","2020-08-05 22:54:00","2020-08-06 01:08:00","2020-08-06 13:26:00","2020-08-06 19:04:00","2020-08-08 11:00:00",
"2020-08-04 03:00:00","2020-08-04 15:00:00","2020-08-04 23:00:00","2020-08-06 14:00:00","2020-08-06 17:00:00","2020-08-06 20:00:00","2020-08-07 04:00:00","2020-08-07 16:00:00"),
Period=c("Day","Day","Day","Night","Night","Day","Night","Day","Night","Day","Night","Day","Day","Night","Night","Day"),
State=c(1,2,1,1,1,1,2,2,1,1,1,2,2,1,1,1),
Acc=c(1.1,2.3,1.7,1.4,0.1,1.9,2.9,2.3,1.1,0.1,1.4,0.2,2.6,1.3,1.7,1.0))
df$Datetime <- as.POSIXct(df$Datetime,format="%Y-%m-%d %H:%M:%S", tz="UTC")
df
#> ID Datetime Period State Acc
#> 1 A 2020-08-05 12:00:00 Day 1 1.1
#> 2 A 2020-08-05 17:00:00 Day 2 2.3
#> 3 A 2020-08-05 18:03:00 Day 1 1.7
#> 4 A 2020-08-05 22:54:00 Night 1 1.4
#> 5 A 2020-08-06 01:08:00 Night 1 0.1
#> 6 A 2020-08-06 13:26:00 Day 1 1.9
#> 7 A 2020-08-06 19:04:00 Night 2 2.9
#> 8 A 2020-08-08 11:00:00 Day 2 2.3
#> 9 B 2020-08-04 03:00:00 Night 1 1.1
#> 10 B 2020-08-04 15:00:00 Day 1 0.1
#> 11 B 2020-08-04 23:00:00 Night 1 1.4
#> 12 B 2020-08-06 14:00:00 Day 2 0.2
#> 13 B 2020-08-06 17:00:00 Day 2 2.6
#> 14 B 2020-08-06 20:00:00 Night 1 1.3
#> 15 B 2020-08-07 04:00:00 Night 1 1.7
#> 16 B 2020-08-07 16:00:00 Day 1 1.0
df %>%
dplyr::group_by(ID, Day = as.factor(as.Date(Datetime)), Period, State, .drop = FALSE) %>%
dplyr::summarise(Acc = mean(Acc, na.rm = TRUE)) %>%
tidyr::pivot_wider(names_from = c(State, Period),
values_from = Acc,
names_prefix = "State.") %>%
dplyr::select(!contains("NA"))
#> `summarise()` regrouping output by 'ID', 'Day', 'Period' (override with `.groups` argument)
#> # A tibble: 10 x 6
#> # Groups: ID, Day [10]
#> ID Day State.1_Day State.2_Day State.1_Night State.2_Night
#> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 A 2020-08-04 NA NA NA NA
#> 2 A 2020-08-05 1.4 2.3 1.4 NA
#> 3 A 2020-08-06 1.9 NA 0.1 2.9
#> 4 A 2020-08-07 NA NA NA NA
#> 5 A 2020-08-08 NA 2.3 NA NA
#> 6 B 2020-08-04 0.1 NA 1.25 NA
#> 7 B 2020-08-05 NA NA NA NA
#> 8 B 2020-08-06 NA 1.4 1.3 NA
#> 9 B 2020-08-07 1 NA 1.7 NA
#> 10 B 2020-08-08 NA NA NA NA
由 reprex 包于 2020 年 11 月 11 日创建(v0.3.0)
对于那些仍在寻找答案的人,我遇到了与名称_from 和值_from 变量类似的错误,并且可以通过将这些变量名称放在引号中来使其工作。不幸的是,我无法重现您的错误,所以我不确定它是否能解决您的问题,但为了后代的缘故,将其放在一起。这似乎是最近的变化。