仅合并R [第一行]中的第一行的两个数据集

Question

这个问题在这里已有答案：

Select only the first row when merging data frames with multiple matches 3回答

我需要合并两个数据集，但在第二个数据集中，可能存在重复的id，例如几个id为1,1,1。如果有重复的id，如何合并到它们的第一行？

更清楚，这是一个可重复的例子：

df1
structure(list(id = 1:2, y = 10:11), .Names = c("id", "y"), class = "data.frame", row.names = c(NA, 
-2L))

df2
structure(list(id = c(1L, 1L, 1L, 2L), x1 = 435:438, x2 = c(435L, 
436L, 436L, 438L), x3 = c(435L, 436L, 436L, 438L)), .Names = c("id", 
"x1", "x2", "x3"), class = "data.frame", row.names = c(NA, -4L
))

示例：在输出中，我期望这种格式

id  y   x1  x2  x3
1   10  435 435 435
2   11  438 438 438

I.E. 2行和3行（1个id）不参与合并。

Answer 1

你可以用data.table来做。您可以只保留第二个数据集中id == 1的第一个匹配项，然后保留merge这两个数据集。

这是解决方案：

library(data.table)
setDT(df2)
df2[, idx := 1:.N, by = id]
df2 <- df2[idx == 1, ]
df2[, idx := NULL]
output <- merge(df1, df2, by = "id")
output

它会给你你想要的输出：

 id  y  x1  x2  x3
1  1 10 435 435 435
2  2 11 438 438 438

仅合并R [第一行]中的第一行的两个数据集

问题描述投票：0回答：1

1个回答

最新问题

仅合并R [第一行]中的第一行的两个数据集

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1