我正在尝试添加两个日期之间的月份和年份列表。
我有以下数据集:
PKID Name Gender DateStart DateEnd
68 PAUL 1 24/11/2021 23/02/2022
68 PAUL 1 24/04/2022 23/06/2023
40 KATE 2 01/01/2000 14/03/2000
40 KATE 2 03/12/2000 31/01/2001
我想创建以下数据集:
PKID Name Gender DateStart DateEnd year Month
68 PAUL 1 24/11/2021 23/02/2022 2021 11
68 PAUL 1 24/11/2021 23/02/2022 2021 12
68 PAUL 1 24/11/2021 23/02/2022 2022 1
68 PAUL 1 24/11/2021 23/02/2022 2022 2
68 PAUL 1 24/04/2022 23/06/2023 2022 4
68 PAUL 1 24/04/2022 23/06/2023 2022 5
68 PAUL 1 24/04/2022 23/06/2023 2022 6
40 KATE 2 01/01/2000 14/03/2000 2000 1
40 KATE 2 01/01/2000 14/03/2000 2000 2
40 KATE 2 01/01/2000 14/03/2000 2000 3
40 KATE 2 03/12/2000 31/01/2001 2000 12
40 KATE 2 03/12/2000 31/01/2001 2001 1
其中月份对应于起始日期和结束日期之间的月份,年份对应于月份。
我尝试过以下方法:
# Load necessary libraries
library(dplyr)
library(tidyr)
library(lubridate) # For handling date operations
# Sample data
df <- read.table(text = "
PKID Name Gender DateStart DateEnd
68 PAUL 1 24/11/2021 23/02/2022
68 PAUL 1 24/04/2022 23/06/2023
40 KATE 2 01/01/2000 14/03/2000
40 KATE 2 03/12/2000 31/01/2001
", header = TRUE, stringsAsFactors = FALSE)
# Convert date columns to Date format
df$DateStart <- dmy(df$DateStart)
df$DateEnd <- dmy(df$DateEnd)
# Generate sequence of dates for each row
df <- df %>%
group_by(PKID, Name, Gender, DateStart, DateEnd) %>%
complete(Date = seq.Date(DateStart, DateEnd, by = "month")) %>%
ungroup() %>%
mutate(
year = year(Date), # Extract year
Month = month(Date) # Extract month
) %>%
select(-Date) # Remove the temporary Date column
# Print the result
print(df)
但是我收到以下错误:
Error in reframe():
ℹ In argument: complete(data = pick(everything()), ..., fill = fill, explicit = explicit).
ℹ In group 1: PKID = 40, Name = "KATE", Gender = 2, DateStart = 2000-01-01, DateEnd = 2000-03-14.
Caused by error:
! object 'DateStart' not found
我对你的方法做了一些改变以使其发挥作用。
library(dplyr)
library(tidyr)
library(lubridate)
df %>%
mutate(row = row_number(),
Date = DateStart) %>%
group_by(row) %>%
complete(PKID, Name, Gender, DateStart, DateEnd,
Date = seq(DateStart, DateEnd, by = "month")) %>%
ungroup() %>%
mutate(
year = year(Date),
Month = month(Date)
) %>%
select(-Date, -row)
# A tibble: 22 × 7
# PKID Name Gender DateStart DateEnd year Month
# <int> <chr> <int> <date> <date> <dbl> <dbl>
# 1 68 PAUL 1 2021-11-24 2022-02-23 2021 11
# 2 68 PAUL 1 2021-11-24 2022-02-23 2021 12
# 3 68 PAUL 1 2021-11-24 2022-02-23 2022 1
# 4 68 PAUL 1 2022-04-24 2023-06-23 2022 4
# 5 68 PAUL 1 2022-04-24 2023-06-23 2022 5
# 6 68 PAUL 1 2022-04-24 2023-06-23 2022 6
# 7 68 PAUL 1 2022-04-24 2023-06-23 2022 7
# 8 68 PAUL 1 2022-04-24 2023-06-23 2022 8
# 9 68 PAUL 1 2022-04-24 2023-06-23 2022 9
#10 68 PAUL 1 2022-04-24 2023-06-23 2022 10
# ℹ 12 more rows
# ℹ Use `print(n = ...)` to see more rows