列出 R 中两个日期之间的所有月份

问题描述 投票:0回答:1

我正在尝试添加两个日期之间的月份和年份列表。

我有以下数据集:

PKID    Name    Gender  DateStart   DateEnd
68      PAUL    1       24/11/2021  23/02/2022
68      PAUL    1       24/04/2022  23/06/2023
40      KATE    2       01/01/2000  14/03/2000
40      KATE    2       03/12/2000  31/01/2001

我想创建以下数据集:

PKID    Name    Gender  DateStart   DateEnd     year    Month
68      PAUL    1       24/11/2021  23/02/2022  2021    11
68      PAUL    1       24/11/2021  23/02/2022  2021    12
68      PAUL    1       24/11/2021  23/02/2022  2022    1
68      PAUL    1       24/11/2021  23/02/2022  2022    2
68      PAUL    1       24/04/2022  23/06/2023  2022    4
68      PAUL    1       24/04/2022  23/06/2023  2022    5
68      PAUL    1       24/04/2022  23/06/2023  2022    6
40      KATE    2       01/01/2000  14/03/2000  2000    1
40      KATE    2       01/01/2000  14/03/2000  2000    2
40      KATE    2       01/01/2000  14/03/2000  2000    3
40      KATE    2       03/12/2000  31/01/2001  2000    12
40      KATE    2       03/12/2000  31/01/2001  2001    1

其中月份对应于起始日期和结束日期之间的月份,年份对应于月份。

我尝试过以下方法:

# Load necessary libraries
library(dplyr)
library(tidyr)
library(lubridate) # For handling date operations

# Sample data 
df <- read.table(text = "
PKID    Name    Gender  DateStart   DateEnd
68  PAUL    1   24/11/2021  23/02/2022
68  PAUL    1   24/04/2022  23/06/2023
40  KATE    2   01/01/2000  14/03/2000
40  KATE    2   03/12/2000  31/01/2001
", header = TRUE, stringsAsFactors = FALSE)

# Convert date columns to Date format
df$DateStart <- dmy(df$DateStart)
df$DateEnd <- dmy(df$DateEnd)

# Generate sequence of dates for each row
df <- df %>%
  group_by(PKID, Name, Gender, DateStart, DateEnd) %>%
  complete(Date = seq.Date(DateStart, DateEnd, by = "month")) %>%
  ungroup() %>%
  mutate(
    year = year(Date),   # Extract year
    Month = month(Date)  # Extract month
  ) %>%
  select(-Date)  # Remove the temporary Date column

# Print the result
print(df)

但是我收到以下错误:

Error in reframe():
ℹ In argument: complete(data = pick(everything()), ..., fill = fill, explicit = explicit).
ℹ In group 1: PKID = 40, Name = "KATE", Gender = 2, DateStart = 2000-01-01, DateEnd = 2000-03-14.
Caused by error:
! object 'DateStart' not found
r dplyr
1个回答
0
投票

我对你的方法做了一些改变以使其发挥作用。

library(dplyr)
library(tidyr)
library(lubridate)

df %>%
  mutate(row = row_number(), 
         Date = DateStart) %>%
  group_by(row) %>%
  complete(PKID, Name, Gender, DateStart, DateEnd, 
           Date = seq(DateStart, DateEnd, by = "month")) %>%
  ungroup() %>%
  mutate(
    year = year(Date),  
    Month = month(Date) 
  ) %>%
  select(-Date, -row)

# A tibble: 22 × 7
#    PKID Name  Gender DateStart  DateEnd     year Month
#   <int> <chr>  <int> <date>     <date>     <dbl> <dbl>
# 1    68 PAUL       1 2021-11-24 2022-02-23  2021    11
# 2    68 PAUL       1 2021-11-24 2022-02-23  2021    12
# 3    68 PAUL       1 2021-11-24 2022-02-23  2022     1
# 4    68 PAUL       1 2022-04-24 2023-06-23  2022     4
# 5    68 PAUL       1 2022-04-24 2023-06-23  2022     5
# 6    68 PAUL       1 2022-04-24 2023-06-23  2022     6
# 7    68 PAUL       1 2022-04-24 2023-06-23  2022     7
# 8    68 PAUL       1 2022-04-24 2023-06-23  2022     8
# 9    68 PAUL       1 2022-04-24 2023-06-23  2022     9
#10    68 PAUL       1 2022-04-24 2023-06-23  2022    10
# ℹ 12 more rows
# ℹ Use `print(n = ...)` to see more rows
© www.soinside.com 2019 - 2024. All rights reserved.