我使用 dplyr 计算了每年每月的观察次数,以确保从 1 月到 12 月的月份顺序正确,从而产生了有序因子。
我想使用函数
lubridate()
和 month()
正确设置年和月的组成部分以进行时间序列分析。
函数
lubridate()
无法处理有序因子(请参阅R代码和错误消息)。我尝试使用 x <- factor( x , ordered = FALSE)
取消对这一列的排序,但我丢失了数据框中除 Month
之外的所有信息。
我尝试将“月份”列设置为基本因子水平,但我得到了以下输出:
Bulbs$Month <- as.factor(Bulbs$Month)
(
$<-.data.frame
,月份,值=整数(0))中的错误: 替换有 0 行,数据有 96
*tmp*
有谁知道如何将有序因子转换回正常因子而不丢失排序级别?
使用
dplyr
计算后的数据框结构:
'data.frame': 96 obs. of 4 variables:
$ Year : num 2012 2012 2012 2012 2012 ...
$ Month : Ord.factor w/ 12 levels "January"<"February"<..: 1 2 4 5 6 7 10 11 12 2 ...
$ Number_Daffodils : num 1 8 18 21 27 12 12 4 3 2 ...
$ Frequency_New_Bulbs : num 7 59 144 193 NA NA 143 22 14 26 ..
R代码:
library(dplyr)
library(lubricate)
Bulbs <- MyDf %>% mutate(Month = factor(trimws(Month), levels = month.name, ordered = TRUE)) %>%
group_by(Year, Month) %>%
summarise(N = n(), Frequency_New_Bulbs = sum(Number_Daffodils))
#Set the components for the time series analysis
Bulbs <- janitor::clean_names(Bulbs)
Bulbs$Year <- lubridate::ymd(paste(Bulbs$year, Bulbs$month, "01", sep = "-"))
Bulbs$month = lubridate::month(Bulbs$month)
#When I run the line **dat$month = lubridate::month(dat$month)** I get this error message.
Error in as.POSIXlt.character(as.character(x), ...) :
character string is not in a standard unambiguous format
In addition: Warning message:
tz(): Don't know how to compute timezone for object of class ordered/factor; returning "UTC".
虚拟数据框
tibble(
Month = sample(month.name, 120, replace = TRUE),
Year = sample(2012:2024, 120, replace = TRUE),
Number_Daffodils = sample(1:5, 120, replace = TRUE)
)
所需输出
year month Number_Daffodils Frequency_New_Bulbs date n_month
1 2015 January 36 31 2015-01-01 1
2 2015 February 28 28 2015-02-01 2
3 2015 March 39 31 2015-03-01 3
4 2015 April 46 30 2015-04-01 4
5 2015 May 5 6 2015-05-01 5
6 2015 June 0 0 2015-06-01 6
如果您的
Month
因子水平正确,您可以将其转换为整数或直接与lubridate::make_date()
一起使用:
library(dplyr)
Bulbs |>
janitor::clean_names() |>
mutate(date = lubridate::make_date(year = year, month = month),
m = as.integer(month))
#> # A tibble: 86 × 6
#> # Groups: year [13]
#> year month n frequency_new_bulbs date m
#> <int> <ord> <int> <int> <date> <int>
#> 1 2012 January 1 2 2012-01-01 1
#> 2 2012 February 4 9 2012-02-01 2
#> 3 2012 April 1 4 2012-04-01 4
#> 4 2012 May 3 10 2012-05-01 5
#> 5 2012 June 1 2 2012-06-01 6
#> 6 2012 July 1 2 2012-07-01 7
#> 7 2012 August 2 6 2012-08-01 8
#> 8 2012 September 1 2 2012-09-01 9
#> 9 2012 October 1 3 2012-10-01 10
#> 10 2012 November 2 9 2012-11-01 11
#> # ℹ 76 more rows
无润滑脂:
df |>
mutate(
n_month = match(Month, month.name),
date = as.Date(sprintf("%d-%d-01", Year, n_month))
)
Month Year Number_Daffodils n_month date
<chr> <int> <int> <int> <date>
1 June 2018 1 6 2018-06-01
2 June 2023 1 6 2023-06-01
3 October 2023 5 10 2023-10-01
4 March 2022 2 3 2022-03-01
5 March 2017 5 3 2017-03-01
6 March 2020 1 3 2020-03-01
7 May 2018 1 5 2018-05-01
8 December 2021 4 12 2021-12-01
9 March 2015 4 3 2015-03-01
10 September 2015 2 9 2015-09-01
# ℹ 110 more rows