找到每个ID的最小日期

问题描述 投票:0回答:3

我有一个具有以下架构的数据表

id|smalltime
1  2199-08-02 20:00:00
2  2150-11-13 15:00:00
...

我有另一个具有以下模式的数据表

id|time
1  2199-08-02 20:10:00
1  2199-08-02 19:00:00
2  2150-11-13 15:10:00
...

我想为数据表中的每个id找到数据表1中每个id的小时间之后的最小日期。

在前面的示例中,我正在寻找以下新数据表:

id|time
1  2199-08-02 20:10:00
2  2150-11-13 15:10:00
r
3个回答
0
投票

你的意思是有类似下面的东西吗?

library(lubridate)
library(dplyr)

df1$smalltime <- ymd_hms(df1$smalltime)
df2$time <- ymd_hms(df2$time)

df2 %>%
  inner_join(df1, by="id") %>%
  mutate(time_diff = time - smalltime) %>%
  filter(time_diff > 0) %>%
  group_by(id) %>%
  summarise(time = time[which.min(time_diff)])

输出是:

     id                time
1     1 2199-08-02 20:10:00
2     2 2150-11-13 15:10:00

样本数据:

df1 <- structure(list(id = 1:2, smalltime = c("2199-08-02 20:00:00", 
"2150-11-13 15:00:00")), .Names = c("id", "smalltime"), class = "data.frame", row.names = c(NA, 
-2L))

df2 <- structure(list(id = c(1L, 1L, 2L), time = c("2199-08-02 20:10:00", 
"2199-08-02 19:00:00", "2150-11-13 15:10:00")), .Names = c("id", 
"time"), class = "data.frame", row.names = c(NA, -3L))

0
投票

你可以这样试试:

library(data.table)
library(purrr)

# convert to date time format
df1[, smalltime := ymd_hms(smalltime)]
df2[, time := ymd_hms(time)]

# merge df2 in df1 while grouping by df2 on id
df1[df2[, list(list(time)), .(id)], on = 'id', z := i.V1]

# check if the time is greater than df1 time
df1[, ans := map2(z, smalltime, function(x, y) lapply(x, function(j) as.character(j[j > y])))]

# extract the time (answer)
df1[, ans1 := map_chr(ans, 1)]

print(df1[,.(id, ans1)])

   id                ans1
1:  1 2199-08-02 20:10:00
2:  2 2150-11-13 15:10:00

0
投票
> A=strptime(df1$smalltime,"%F %T")
> B=strptime(df2$time,"%F %T")
> d=findInterval(B,sort(A))
> unname(by(B,list(d,df2$id),function(x)format(min(x),"%F %T"))[unique(d)])
[1] "2199-08-02 20:10:00" "2150-11-13 15:10:00"
© www.soinside.com 2019 - 2024. All rights reserved.