如何按患者和实验室结果类型和时刻进行分区,以选择最接近时刻代码开头的结果并转换为宽格式?

问题描述 投票:0回答:1

在 Rstudio 中,我有一个带有类似于下面的 labresult 的数据集。它具有 Crea、Gluc 和 Hb 值,并且每个患者都有一个参与开始日期。

注意:真实数据集更大,有更多患者和更多实验室结果。

我添加了一列包含天数的列,我想添加一列

MomentCode
将记录标记为最接近基线、第 3 个月和第 12 个月,容差为一周(7 天)。

此后,我只想限制为最接近(但不是之前)基线、3 个月、12 个月等的单个实验室结果。

所以最终结果格式应该是这样的:

患者ID 参与日期 基线_创建 基线_Gluc 基线_Hb MONTH3_Crea MONTH3_Gluc MONTH3_Hb MONTH12_Crea MONTH12_Gluc MONTH12_Hb
PAT_001 2023年5月1日 5.0 5.0 等等 等等
PAT_220 2023年3月12日 5.0 5.0

我正在一步一步地做所有事情,但也许有更好的方法? 请参阅下面的代码

df <- read.table(text = "
PatientID,ParticipationDate,LabResultDate,LabType,LabValue,Verified
PAT_001,05-01-2023,05-01-2023,Crea,5.0,Yes
PAT_001,05-01-2023,07-01-2023,Gluc,4.2,Yes
PAT_001,05-01-2023,09-01-2023,Hb,4.2,No
PAT_220,12-03-2023,03-03-2023,Hb,5.2,Yes
PAT_220,12-03-2023,15-03-2023,Hb,5.3,Yes
PAT_220,12-03-2023,16-03-2023,Gluc,4.4,Yes
PAT_001,05-01-2023,03-04-2023,Gluc,4.6,Yes
PAT_001,05-01-2023,06-04-2023,Crea,5.4,Yes
PAT_001,05-01-2023,07-04-2023,Crea,5.0,Yes
PAT_001,05-01-2023,08-04-2023,Hb,5.1,Yes
PAT_220,12-03-2023,11-06-2023,Gluc,5.3,Yes
PAT_220,12-03-2023,12-06-2023,Crea,4.8,No
PAT_220,12-03-2023,14-06-2023,Hb,4.6,Yes
PAT_220,12-03-2023,28-06-2023,Crea,3.9,No
PAT_220,12-03-2023,23-07-2023,Hb,5.1,No
PAT_001,05-01-2023,27-07-2023,Gluc,4.3,Yes
PAT_220,12-03-2023,29-07-2023,Crea,5.1,Yes
PAT_220,12-03-2023,25-08-2023,Gluc,4.9,Yes
PAT_220,12-03-2023,27-08-2023,Crea,4.3,Yes
PAT_220,12-03-2023,14-09-2023,Crea,5.5,Yes
PAT_001,05-01-2023,17-09-2023,Hb,5.5,Yes
PAT_220,12-03-2023,09-11-2023,Hb,5.4,No
PAT_001,05-01-2023,13-11-2023,Gluc,4.2,Yes
PAT_001,05-01-2023,17-11-2023,Hb,5.2,Yes
PAT_001,05-01-2023,29-12-2023,Crea,5.4,Yes
PAT_001,05-01-2023,31-12-2023,Crea,4.4,Yes
PAT_220,12-03-2023,03-01-2024,Gluc,4.2,Yes
PAT_001,05-01-2023,09-01-2024,Gluc,5.4,Yes
PAT_001,05-01-2023,09-01-2024,Hb,4.0,Yes
PAT_001,05-01-2023,13-01-2024,Crea,4.7,Yes
PAT_001,05-01-2023,07-03-2024,Hb,4.2,Yes
PAT_220,12-03-2023,14-03-2024,Gluc,4.4,Yes
PAT_220,12-03-2023,15-03-2024,Crea,5.0,No
PAT_220,12-03-2023,17-03-2024,Hb,3.9,Yes
PAT_220,12-03-2023,23-05-2024,Crea,4.4,Yes
PAT_001,05-01-2023,23-06-2024,Gluc,4.8,No
PAT_220,12-03-2023,04-08-2024,Hb,4.3,Yes
PAT_220,12-03-2023,24-08-2024,Gluc,4.5,Yes
", header = TRUE, sep = ",", na.strings = "")

# days since participation
df$DaysSince <- as.numeric(difftime(df$ParticipationDate, df$LabResultDate, units = "days"))

# mark as moment (one week tolerance) how to??
library(dplyr)
df <- df %>%
  mutate(MomentCode = case_when(
           DaysSince>=0   && DaysSince<7     ~ "Baseline",
           DaysSince>=90  && DaysSince<90+7  ~ "Month3",
           DaysSince>=365 && DaysSince<365+7 ~ "Month12"
         )
  )

# patition by patient and code and moment to get only values closest to each moment code, how??

# pivot to wide format, how??
r pivot rstudio partitioning
1个回答
0
投票

关于示例数据的一些评论:

  1. 您计算实验室结果和参与之间的差异,应该是相反的。
  2. 有负数
    DaysSince
    LabResultDate
    之前
    ParticipationDate
    这没有意义)
  3. 参与和实验室测量之间存在差异,属于您定义的任何类别(少于 90 天但超过 7 天)
  4. 对于给定的时间范围实验室类型组合有多种测量。

话虽如此,我做出了以下假设:

  1. 我们定义了 2 个截止日期:7 天和 97 天。任何差异 <= 7 days will fall into the baseline, differences > 7 天和 <= 97 days fall into the 3 month slot and differnces > 97 天都属于 12 个月。
  2. 如果每个组合有多个测量值,我们将取这些测量值的平均值:
library(dplyr)
library(tidyr)

df2 <- df %>%
  as_tibble() %>% ## format as tibble for nicer printing
  mutate(## format date columns as date
         across(ends_with("Date"), dmy), 
         ## calculate differences in days
         DaysSince = as.numeric(difftime(LabResultDate, ParticipationDate, unit = "days")), 
         ## split day differences into buckets with cutoffs 7 and 97 days
         MomentCode = cut(DaysSince, c(-Inf, 7, 97, Inf), c("Baseline", "Month3", "Month12")),
         ## create new column comnbining LabType and MomentCode
         Measurement = paste(LabType, MomentCode, sep = "_")
  )

df2 %>%
  pivot_wider(names_from = Measurement, 
              values_from = LabValue, 
              id_cols = PatientID:ParticipationDate, 
              ## take mean for duplicated measurements
              values_fn = mean)

# # A tibble: 2 × 11
#   PatientID ParticipationDate Crea_Baseline Gluc_Baseline Hb_Baseline Gluc_Month3 Crea_Month3 Hb_Month3
#   <chr>     <date>                    <dbl>         <dbl>       <dbl>       <dbl>       <dbl>     <dbl>
# 1 PAT_001   2023-01-05                    5           4.2        4.2          4.6         5.2       5.1
# 2 PAT_220   2023-03-12                   NA           4.4        5.25         5.3         4.8       4.6
# # ℹ 3 more variables: Crea_Month12 <dbl>, Hb_Month12 <dbl>, Gluc_Month12 <dbl>

© www.soinside.com 2019 - 2024. All rights reserved.