我有一个如下所示的数据框:
df <- tibble(
period = list(
c("09:34:00-20:40:00", "20:57:00-21:00:00"),
c("16:03:00-19:00:00", "19:10:00-21:00:00", "21:15-24:00"),
"7:02:00-13:20:00",
c("9:00:00-12:15:00", "14:30:00-16:30:00")
)
)
我想创建一个新变量,其中包含任意串联时段(或单个时间,未串联)中 12:00 到 15:00 之间的总小时数(包括小数点)。输出应如下所示像下面这样:
df <- tibble(
period = list(
c("09:34:00-20:40:00", "20:57:00-21:00:00"),
c("16:03:00-19:00:00", "19:10:00-21:00:00", "21:15-24:00"),
"7:02:00-13:20:00",
c("9:00:00-12:15:00", "14:30:00-16:30:00")
),
hrs = list(3.0,0,1.2,0.75)
)
如何创建这个新变量来提取与指定时间段重叠的小时数?
library(dplyr)
library(tidyr)
library(anytime)
df %>%
mutate(id = row_number()) %>%
unnest(period) %>%
separate(period, into = c("p1", "p2"), sep = "-", remove = FALSE) %>%
mutate(across(c(p1, p2),
~anytime(paste("2020-01-01", .x),
tz = "UTC", asUTC = TRUE))) %>%
mutate(across(c(p1, p2),
~case_when(.x > as.POSIXct("2020-01-01 15:00", tz = "UTC") ~
as.POSIXct("2020-01-01 15:00", tz = "UTC"),
.x < as.POSIXct("2020-01-01 12:00", tz = "UTC") ~
as.POSIXct("2020-01-01 12:00", tz = "UTC"),
.default = .x))) %>%
mutate(hrs = difftime(p2, p1, units = "hours")) %>%
summarize(period = list(period),
hrs = sum(hrs),
.by = id) %>%
select(-id)
#> # A tibble: 4 × 2
#> period hrs
#> <list> <drtn>
#> 1 <chr [2]> 3.000000 hours
#> 2 <chr [3]> 0.000000 hours
#> 3 <chr [1]> 1.333333 hours
#> 4 <chr [2]> 0.750000 hours
创建于 2024 年 12 月 13 日,使用 reprex v2.0.2