查找与给定时间段重叠的数据集的多个值中的总时间长度 (R)

问题描述 投票:0回答:1

我有一个如下所示的数据框:

df <- tibble(
  period = list(
    c("09:34:00-20:40:00", "20:57:00-21:00:00"),
    c("16:03:00-19:00:00", "19:10:00-21:00:00", "21:15-24:00"),
    "7:02:00-13:20:00",
    c("9:00:00-12:15:00", "14:30:00-16:30:00")

  )
)

我想创建一个新变量,其中包含任意串联时段(或单个时间,未串联)中 12:00 到 15:00 之间的总小时数(包括小数点)。输出应如下所示像下面这样:

df <- tibble(
  period = list(
    c("09:34:00-20:40:00", "20:57:00-21:00:00"),
    c("16:03:00-19:00:00", "19:10:00-21:00:00", "21:15-24:00"),
    "7:02:00-13:20:00",
    c("9:00:00-12:15:00", "14:30:00-16:30:00")

  ),
  hrs = list(3.0,0,1.2,0.75)
)

如何创建这个新变量来提取与指定时间段重叠的小时数?

r date time tidyverse
1个回答
0
投票
library(dplyr)
library(tidyr)
library(anytime)

df %>% 
  mutate(id = row_number()) %>% 
  unnest(period) %>% 
  separate(period, into = c("p1", "p2"), sep = "-", remove = FALSE) %>% 
  mutate(across(c(p1, p2), 
                ~anytime(paste("2020-01-01", .x), 
                         tz = "UTC", asUTC = TRUE))) %>% 
  mutate(across(c(p1, p2), 
                ~case_when(.x > as.POSIXct("2020-01-01 15:00", tz = "UTC") ~ 
                             as.POSIXct("2020-01-01 15:00", tz = "UTC"),
                           .x < as.POSIXct("2020-01-01 12:00", tz = "UTC") ~ 
                             as.POSIXct("2020-01-01 12:00", tz = "UTC"),
                           .default = .x))) %>% 
  mutate(hrs = difftime(p2, p1, units = "hours")) %>% 
  summarize(period = list(period), 
            hrs = sum(hrs), 
            .by = id) %>% 
  select(-id)

#> # A tibble: 4 × 2
#>   period    hrs           
#>   <list>    <drtn>        
#> 1 <chr [2]> 3.000000 hours
#> 2 <chr [3]> 0.000000 hours
#> 3 <chr [1]> 1.333333 hours
#> 4 <chr [2]> 0.750000 hours

创建于 2024 年 12 月 13 日,使用 reprex v2.0.2

© www.soinside.com 2019 - 2024. All rights reserved.