在 Rstudio 中,我有一个带有类似于下面的 labresult 的数据集。它具有 Crea、Gluc 和 Hb 值,并且每个患者都有一个参与开始日期。
注意:真实数据集更大,有更多患者和更多实验室结果。
我添加了一列包含天数的列,我想添加一列
MomentCode
将记录标记为最接近基线、第 3 个月和第 12 个月,容差为一周(7 天)。
此后,我只想限制为最接近(但不是之前)基线、3 个月、12 个月等的单个实验室结果。
所以最终结果格式应该是这样的:
患者ID | 参与日期 | 基线_创建 | 基线_Gluc | 基线_Hb | MONTH3_Crea | MONTH3_Gluc | MONTH3_Hb | MONTH12_Crea | MONTH12_Gluc | MONTH12_Hb |
---|---|---|---|---|---|---|---|---|---|---|
PAT_001 | 2023年5月1日 | 5.0 | 5.0 | 等等 | 等等 | |||||
PAT_220 | 2023年3月12日 | 5.0 | 5.0 |
我正在一步一步地做所有事情,但也许有更好的方法? 请参阅下面的代码
df <- read.table(text = "
PatientID,ParticipationDate,LabResultDate,LabType,LabValue,Verified
PAT_001,05-01-2023,05-01-2023,Crea,5.0,Yes
PAT_001,05-01-2023,07-01-2023,Gluc,4.2,Yes
PAT_001,05-01-2023,09-01-2023,Hb,4.2,No
PAT_220,12-03-2023,03-03-2023,Hb,5.2,Yes
PAT_220,12-03-2023,15-03-2023,Hb,5.3,Yes
PAT_220,12-03-2023,16-03-2023,Gluc,4.4,Yes
PAT_001,05-01-2023,03-04-2023,Gluc,4.6,Yes
PAT_001,05-01-2023,06-04-2023,Crea,5.4,Yes
PAT_001,05-01-2023,07-04-2023,Crea,5.0,Yes
PAT_001,05-01-2023,08-04-2023,Hb,5.1,Yes
PAT_220,12-03-2023,11-06-2023,Gluc,5.3,Yes
PAT_220,12-03-2023,12-06-2023,Crea,4.8,No
PAT_220,12-03-2023,14-06-2023,Hb,4.6,Yes
PAT_220,12-03-2023,28-06-2023,Crea,3.9,No
PAT_220,12-03-2023,23-07-2023,Hb,5.1,No
PAT_001,05-01-2023,27-07-2023,Gluc,4.3,Yes
PAT_220,12-03-2023,29-07-2023,Crea,5.1,Yes
PAT_220,12-03-2023,25-08-2023,Gluc,4.9,Yes
PAT_220,12-03-2023,27-08-2023,Crea,4.3,Yes
PAT_220,12-03-2023,14-09-2023,Crea,5.5,Yes
PAT_001,05-01-2023,17-09-2023,Hb,5.5,Yes
PAT_220,12-03-2023,09-11-2023,Hb,5.4,No
PAT_001,05-01-2023,13-11-2023,Gluc,4.2,Yes
PAT_001,05-01-2023,17-11-2023,Hb,5.2,Yes
PAT_001,05-01-2023,29-12-2023,Crea,5.4,Yes
PAT_001,05-01-2023,31-12-2023,Crea,4.4,Yes
PAT_220,12-03-2023,03-01-2024,Gluc,4.2,Yes
PAT_001,05-01-2023,09-01-2024,Gluc,5.4,Yes
PAT_001,05-01-2023,09-01-2024,Hb,4.0,Yes
PAT_001,05-01-2023,13-01-2024,Crea,4.7,Yes
PAT_001,05-01-2023,07-03-2024,Hb,4.2,Yes
PAT_220,12-03-2023,14-03-2024,Gluc,4.4,Yes
PAT_220,12-03-2023,15-03-2024,Crea,5.0,No
PAT_220,12-03-2023,17-03-2024,Hb,3.9,Yes
PAT_220,12-03-2023,23-05-2024,Crea,4.4,Yes
PAT_001,05-01-2023,23-06-2024,Gluc,4.8,No
PAT_220,12-03-2023,04-08-2024,Hb,4.3,Yes
PAT_220,12-03-2023,24-08-2024,Gluc,4.5,Yes
", header = TRUE, sep = ",", na.strings = "")
# days since participation
df$DaysSince <- as.numeric(difftime(df$ParticipationDate, df$LabResultDate, units = "days"))
# mark as moment (one week tolerance) how to??
library(dplyr)
df <- df %>%
mutate(MomentCode = case_when(
DaysSince>=0 && DaysSince<7 ~ "Baseline",
DaysSince>=90 && DaysSince<90+7 ~ "Month3",
DaysSince>=365 && DaysSince<365+7 ~ "Month12"
)
)
# patition by patient and code and moment to get only values closest to each moment code, how??
# pivot to wide format, how??
关于示例数据的一些评论:
DaysSince
(LabResultDate
之前ParticipationDate
这没有意义)话虽如此,我做出了以下假设:
library(dplyr)
library(tidyr)
df2 <- df %>%
as_tibble() %>% ## format as tibble for nicer printing
mutate(## format date columns as date
across(ends_with("Date"), dmy),
## calculate differences in days
DaysSince = as.numeric(difftime(LabResultDate, ParticipationDate, unit = "days")),
## split day differences into buckets with cutoffs 7 and 97 days
MomentCode = cut(DaysSince, c(-Inf, 7, 97, Inf), c("Baseline", "Month3", "Month12")),
## create new column comnbining LabType and MomentCode
Measurement = paste(LabType, MomentCode, sep = "_")
)
df2 %>%
pivot_wider(names_from = Measurement,
values_from = LabValue,
id_cols = PatientID:ParticipationDate,
## take mean for duplicated measurements
values_fn = mean)
# # A tibble: 2 × 11
# PatientID ParticipationDate Crea_Baseline Gluc_Baseline Hb_Baseline Gluc_Month3 Crea_Month3 Hb_Month3
# <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 PAT_001 2023-01-05 5 4.2 4.2 4.6 5.2 5.1
# 2 PAT_220 2023-03-12 NA 4.4 5.25 5.3 4.8 4.6
# # ℹ 3 more variables: Crea_Month12 <dbl>, Hb_Month12 <dbl>, Gluc_Month12 <dbl>