按类别对连续时间序列天进行分组

问题描述 投票:0回答:1

我有一个数据框,其中包含多个位置的每日分类值。我正在尝试创建一个新的数据框,将每个位置的每个分类值的连续天数和独立天数进行分组。我对编码还很陌生,在

dplyr
中一次按多个参数进行分组和过滤时遇到困难。这是我到目前为止的代码,示例启动数据框以及我为新数据框设想的内容:

代码:

#Rstudio
> new_df <- df %>% group_by(grp = cumsum(c(0, diff(Date) != 1))) %>% mutate(start_date== "") %>% mutate(end_date== "") %>% mutate(total_days == end_date - start_date)

df:

地点 日期 类别
站点1 2007-11-03 猫1
站点1 2007-11-04 猫1
站点1 2007-11-05 猫3
站点1 2007-11-06 猫2
站点1 2007-11-07 猫2
站点2 2007-11-06 猫2
站点2 2007-11-07 猫2

new_df:

地点 开始日期 结束日期 总天数 类别
站点1 2007-11-03 2007-11-04 2 猫1
站点1 2007-11-05 2007-11-05 1 猫3
站点1 2007-11-06 2007-11-07 2 猫2
站点2 2007-11-06 2007-11-07 2 猫2
r dplyr filter group-by cumsum
1个回答
0
投票
library(dplyr)

df |>
  mutate(
    start_date = min(Date),
    end_date = max(Date),
    total_days = n(),
    .by = c(location, category),
    .keep = "unused"
  ) |>
  distinct() |>
  relocate(category, .after = last_col())

# A tibble: 4 × 5
#  location start_date end_date   total_days category
#  <chr>    <date>     <date>          <int> <chr>   
#1 site1    2007-11-03 2007-11-04          2 cat1    
#2 site1    2007-11-05 2007-11-05          1 cat3    
#3 site1    2007-11-06 2007-11-07          2 cat2    
#4 site2    2007-11-06 2007-11-07          2 cat2  

数据:

df <- readr::read_table("
location    Date    category
site1   2007-11-03  cat1
site1   2007-11-04  cat1
site1   2007-11-05  cat3
site1   2007-11-06  cat2
site1   2007-11-07  cat2
site2   2007-11-06  cat2
site2   2007-11-07  cat2")
© www.soinside.com 2019 - 2024. All rights reserved.