如何在 R 中读取带有时区偏移量的字符日期时间?

问题描述 投票:0回答:1

我有一个包含日期时间字符列的数据集,采用 CEST/CET 时区(中欧当地时间)。时区偏移量末尾用+01:00/+02:00 表示。我想将其转换为 POSIXct 格式,以便稍后可以将其转换为 UTC,但是在 10 月的长时钟更改日,从凌晨 2 点到凌晨 3 点的额外时间被错误读取,因为时区偏移似乎被忽略了:

Screenshot of source dataframe

我的目标是让

reprex$datetime_CEST_CET_converted[4]
返回“2023-10-29 02:00:00 CET”而不是“2023-10-29 02:00:00 CEST”:

Screenshot of console

代表:

library(dplyr)
library(lubridate)

source <- data.frame(
  datetime_CEST_CET_character = c("2023-10-29 00:00+02:00", "2023-10-29 01:00+02:00", "2023-10-29 02:00+02:00",
                                  "2023-10-29 02:00+01:00", "2023-10-29 03:00+01:00", "2023-10-29 04:00+01:00")
)

reprex <- source %>%
  mutate(datetime_CEST_CET_converted = as.POSIXct(datetime_CEST_CET_character, tz = "Europe/Paris"),
         datetime_UTC = with_tz(datetime_CEST_CET_converted, tzone = "UTC"))

reprex$datetime_CEST_CET_converted[3]

reprex$datetime_CEST_CET_converted[4]

reprex$datetime_CEST_CET_converted[5] - hours(1)

我尝试在删除时区偏移中的冒号后在

format="%Y-%m-%d %H:%M+%z"
中添加
as.POSIXct()
,但结果是 NA:

Screenshot of source dataframe without colons

Screenshot of console without colons

source_without_colon_in_timezone <- data.frame(
  datetime_CEST_CET_character = c("2023-10-29 00:00+0200", "2023-10-29 01:00+0200", "2023-10-29 02:00+0200",
                                  "2023-10-29 02:00+0100", "2023-10-29 03:00+0100", "2023-10-29 04:00+0100")
)

reprex_without_colon_in_timezone <- source_without_colon_in_timezone %>%
  mutate(datetime_CEST_CET_converted = as.POSIXct(datetime_CEST_CET_character, format="%Y-%m-%d %H:%M+%z", tz = "Europe/Paris"),
         datetime_UTC = with_tz(datetime_CEST_CET_converted, tzone = "UTC"))

reprex_without_colon_in_timezone$datetime_CEST_CET_converted[3]

reprex_without_colon_in_timezone$datetime_CEST_CET_converted[4]

reprex_without_colon_in_timezone$datetime_CEST_CET_converted[5] - hours(1)
r datetime timezone timezone-offset
1个回答
0
投票

编写一个函数来进行转换。
下面,函数

convert_CEST_CET_UTC
使用基本 R 管道,因此它不依赖于
magrittr
的管道。它首先用加号分割输入字符串,然后从时区校正中提取日期时间。将这些结果传输到适当的
lubridate
函数后,它们是真实的日期和时间,因此可以添加。这个和就是返回值。

source <- data.frame(
  datetime_CEST_CET_character = c("2023-10-29 00:00+02:00", "2023-10-29 01:00+02:00", "2023-10-29 02:00+02:00",
                                  "2023-10-29 02:00+01:00", "2023-10-29 03:00+01:00", "2023-10-29 04:00+01:00")
)

convert_CEST_CET_UTC <- function(x) {
  s <- x |> strsplit("\\+")
  sapply(s, `[[`, 2L) |> lubridate::hm() -> tz
  sapply(s, `[[`, 1L) |> lubridate::ymd_hm() -> hm
  hm + tz
}

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

source %>%
  mutate(datetime_UTC = convert_CEST_CET_UTC(datetime_CEST_CET_character))
#>   datetime_CEST_CET_character        datetime_UTC
#> 1      2023-10-29 00:00+02:00 2023-10-29 02:00:00
#> 2      2023-10-29 01:00+02:00 2023-10-29 03:00:00
#> 3      2023-10-29 02:00+02:00 2023-10-29 04:00:00
#> 4      2023-10-29 02:00+01:00 2023-10-29 03:00:00
#> 5      2023-10-29 03:00+01:00 2023-10-29 04:00:00
#> 6      2023-10-29 04:00+01:00 2023-10-29 05:00:00

创建于 2024 年 10 月 1 日,使用 reprex v2.1.0

© www.soinside.com 2019 - 2024. All rights reserved.