我有一个具有以下架构的数据表
id|smalltime
1 2199-08-02 20:00:00
2 2150-11-13 15:00:00
...
我有另一个具有以下模式的数据表
id|time
1 2199-08-02 20:10:00
1 2199-08-02 19:00:00
2 2150-11-13 15:10:00
...
我想为数据表中的每个id找到数据表1中每个id的小时间之后的最小日期。
在前面的示例中,我正在寻找以下新数据表:
id|time
1 2199-08-02 20:10:00
2 2150-11-13 15:10:00
你的意思是有类似下面的东西吗?
library(lubridate)
library(dplyr)
df1$smalltime <- ymd_hms(df1$smalltime)
df2$time <- ymd_hms(df2$time)
df2 %>%
inner_join(df1, by="id") %>%
mutate(time_diff = time - smalltime) %>%
filter(time_diff > 0) %>%
group_by(id) %>%
summarise(time = time[which.min(time_diff)])
输出是:
id time
1 1 2199-08-02 20:10:00
2 2 2150-11-13 15:10:00
样本数据:
df1 <- structure(list(id = 1:2, smalltime = c("2199-08-02 20:00:00",
"2150-11-13 15:00:00")), .Names = c("id", "smalltime"), class = "data.frame", row.names = c(NA,
-2L))
df2 <- structure(list(id = c(1L, 1L, 2L), time = c("2199-08-02 20:10:00",
"2199-08-02 19:00:00", "2150-11-13 15:10:00")), .Names = c("id",
"time"), class = "data.frame", row.names = c(NA, -3L))
你可以这样试试:
library(data.table)
library(purrr)
# convert to date time format
df1[, smalltime := ymd_hms(smalltime)]
df2[, time := ymd_hms(time)]
# merge df2 in df1 while grouping by df2 on id
df1[df2[, list(list(time)), .(id)], on = 'id', z := i.V1]
# check if the time is greater than df1 time
df1[, ans := map2(z, smalltime, function(x, y) lapply(x, function(j) as.character(j[j > y])))]
# extract the time (answer)
df1[, ans1 := map_chr(ans, 1)]
print(df1[,.(id, ans1)])
id ans1
1: 1 2199-08-02 20:10:00
2: 2 2150-11-13 15:10:00
> A=strptime(df1$smalltime,"%F %T")
> B=strptime(df2$time,"%F %T")
> d=findInterval(B,sort(A))
> unname(by(B,list(d,df2$id),function(x)format(min(x),"%F %T"))[unique(d)])
[1] "2199-08-02 20:10:00" "2150-11-13 15:10:00"