我有一个数据框,其中包含多个位置的每日分类值。我正在尝试创建一个新的数据框,将每个位置的每个分类值的连续天数和独立天数进行分组。我对编码还很陌生,在
dplyr
中一次按多个参数进行分组和过滤时遇到困难。这是我到目前为止的代码,示例启动数据框以及我为新数据框设想的内容:
代码:
#Rstudio
> new_df <- df %>% group_by(grp = cumsum(c(0, diff(Date) != 1))) %>% mutate(start_date== "") %>% mutate(end_date== "") %>% mutate(total_days == end_date - start_date)
df:
地点 | 日期 | 类别 |
---|---|---|
站点1 | 2007-11-03 | 猫1 |
站点1 | 2007-11-04 | 猫1 |
站点1 | 2007-11-05 | 猫3 |
站点1 | 2007-11-06 | 猫2 |
站点1 | 2007-11-07 | 猫2 |
站点2 | 2007-11-06 | 猫2 |
站点2 | 2007-11-07 | 猫2 |
new_df:
地点 | 开始日期 | 结束日期 | 总天数 | 类别 |
---|---|---|---|---|
站点1 | 2007-11-03 | 2007-11-04 | 2 | 猫1 |
站点1 | 2007-11-05 | 2007-11-05 | 1 | 猫3 |
站点1 | 2007-11-06 | 2007-11-07 | 2 | 猫2 |
站点2 | 2007-11-06 | 2007-11-07 | 2 | 猫2 |
library(dplyr)
df |>
mutate(
start_date = min(Date),
end_date = max(Date),
total_days = n(),
.by = c(location, category),
.keep = "unused"
) |>
distinct() |>
relocate(category, .after = last_col())
# A tibble: 4 × 5
# location start_date end_date total_days category
# <chr> <date> <date> <int> <chr>
#1 site1 2007-11-03 2007-11-04 2 cat1
#2 site1 2007-11-05 2007-11-05 1 cat3
#3 site1 2007-11-06 2007-11-07 2 cat2
#4 site2 2007-11-06 2007-11-07 2 cat2
数据:
df <- readr::read_table("
location Date category
site1 2007-11-03 cat1
site1 2007-11-04 cat1
site1 2007-11-05 cat3
site1 2007-11-06 cat2
site1 2007-11-07 cat2
site2 2007-11-06 cat2
site2 2007-11-07 cat2")