我使用怀孕 - 长度[因子]变量,以周数+天数(39 + 3)给出,我需要能够使用整数,我可以在组之间进行比较并计算平均值。因此,无论是276天(37 * 7 + 3)还是37,43周(37 +(3/7))。有什么建议吗?
这两个(整天与分数周)是完全相同的,但你可能应该分数周,因为a)周更容易联系到b)分数周是连续的,整天是离散的,连续数据通常更容易。
你应该能够使用lubridate
来解决这个问题。
假设你的变量像a+b
那里a
在几周,而b
是几天。
library(lubridate)
s <- "39+3"
s <- gsub("$", "d", gsub("\\+", "W ", s)) #Add W and d to denote Weeks and days
s
[1] "39W 3d"
period(s) #Convert into a period format
[1] "276d 0H 0M 0S"
as.numeric(period(s), "days") #Change that to noofdays
[1] 276
一些与data.table
fiddlng ..
样本数据
library( data.table )
set.seed(123)
DT <- data.table( pregnancy.length = paste0( sample(20:42, 100, replace = TRUE),
"+",
sample(1:6, 100, replace = TRUE) ),
stringsAsFactors = FALSE )
码
#first, split the pregnancy-length on the `+`-sign
DT[, c("weeks", "days") := lapply( tstrsplit( pregnancy.length, "\\+"), as.numeric )]
#then caluculate weeks, days, or both
DT[, `:=`( week.total = weeks + days / 7, day.total = weeks * 7 + days )]
**输出
head(DT)
# pregnancy.length weeks days week.total day.total
# 1: 26+4 26 4 26.57143 186
# 2: 38+2 38 2 38.28571 268
# 3: 29+3 29 3 29.42857 206
# 4: 40+6 40 6 40.85714 286
# 5: 41+3 41 3 41.42857 290
# 6: 21+6 21 6 21.85714 153