如何根据id、日期和大致时间连接两个数据集?

问题描述 投票:0回答:1

我有两个数据集(A,B),需要根据日期、id 和最近时间合并这两个数据集(请参阅合并数据集)。两个数据集中的时间并不完全匹配,数据集 B 中的时间始终比数据集 A 中的时间晚 0 到 10 分钟。

我已经尝试过

left_join
within, between, overlaps, etc.
,但无法管理。我想我做错了什么。我无法共享真实数据,但我做了一个简单的数据集示例。 如果您能帮助我,我将不胜感激。 非常感谢

DATASET A 
DATETIME            | ID | W
--------------------------------
2020-12-02 18:02:01 | 1  | 0.25
2020-12-02 19:06:21 | 1  | 0.35
2020-12-02 18:12:08 | 2  | 0.44
2020-12-03 10:03:03 | 3  | 0.98

DATASET B
DATETIME            | ID | X1  | X3
--------------------------------------
2020-12-02 18:08:01 | 1  | 1.3 | 99.3
2020-12-02 18:21:11 | 2  | 4.2 | 33.2
2020-12-03 10:09:22 | 3  | 7.1 | 39.9

MERGED DATASET
DATETIME.x          | ID.x | W    | DATETIME.y          | ID.y | X1  | X3 
----------------------------------------------------------------------------
2020-12-02 18:02:01 | 1    | 0.25 | 2020-12-02 18:08:01 | 1    | 1.3 | 99.3
2020-12-02 19:06:21 | 1    | 0.35 |                     |      |     |
2020-12-02 18:12:08 | 2    | 0.44 | 2020-12-02 18:21:11 | 2    | 4.2 | 33.2
2020-12-03 10:03:03 | 3    | 0.98 | 2020-12-03 10:09:22 | 3    | 7.1 | 39.9
r dataset left-join
1个回答
0
投票

我使用

fuzzyjoin
进行类似的连接:

fuzzyjoin::fuzzy_left_join(a, b,
by = c("ID" = "ID", "DATETIME" = "DATETIME"),
  match_fun = list(`==`, function(x, y) abs(difftime(x, y, units = "mins")) <= 10)
)

输出:

           DATETIME.x ID.x    W          DATETIME.y ID.y  X1   X3
1 2020-12-02 18:02:01    1 0.25 2020-12-02 18:08:01    1 1.3 99.3
2 2020-12-02 19:06:21    1 0.35                <NA>   NA  NA   NA
3 2020-12-02 18:12:08    2 0.44 2020-12-02 18:21:11    2 4.2 33.2
4 2020-12-03 10:03:03    3 0.98 2020-12-03 10:09:22    3 7.1 39.9

数据:

a <- structure(list(DATETIME = structure(c(1606932121, 1606935981, 
                                           1606932728, 1606989783), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
                    ID = c(1L, 1L, 2L, 3L), W = c(0.25, 0.35, 0.44, 0.98)), row.names = c("1", 
                                                                                          "2", "3", "4"), class = "data.frame")
b <- structure(list(DATETIME = structure(c(1606932481, 1606933271, 
                                           1606990162), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
                    ID = 1:3, X1 = c(1.3, 4.2, 7.1), X3 = c(99.3, 33.2, 39.9)), row.names = c(NA, 
                                                                                              -3L), class = "data.frame")

merged <- structure(list(DATETIME.x = structure(c(1606932121, 1606935981, 
                                                  1606932728, 1606989783), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
                         ID.x = c(1L, 1L, 2L, 3L), W = c(0.25, 0.35, 0.44, 0.98), 
                         DATETIME.y = structure(c(1606932481, NA, 1606933271, 1606990162
                         ), class = c("POSIXct", "POSIXt"), tzone = "UTC"), ID.y = c(1L, 
                                                                                     NA, 2L, 3L), X1 = c(1.3, NA, 4.2, 7.1), X3 = c(99.3, NA, 
                                                                                                                                    33.2, 39.9)), row.names = c(NA, -4L), class = "data.frame")
© www.soinside.com 2019 - 2024. All rights reserved.