如何合并两个数据帧范围内的范围

问题描述 投票:0回答:2

我的大脑没有动脑子,我不知道如何合并一个范围内的两个数据帧。我希望你能帮助我。

df1 <- data.frame(
  type = rep("A", 6),
  start = c(66, 753, 1436, 2121, 8080, 8982),
  end = c(752, 1435, 2120, 2805, 8972, 9312),
  col1 = c(97.384, 97.23, 97.522, 97.522, 97.522, 97.376),
  col2 = c(686, 682, 684, 684, 892, 330)
)

df2 <- data.frame(
  name = c("apple", "apple"),
  start = c(651, 8009),
  end = c(7964, 9314),
  val = c(71298, 1982),
  ave_val = c(9.749487215, 1.518773946)
)

这就是我想要得到的结果,它是一个结果列,当它在范围内时,将 cover 与 ave_val 相乘。或者,如果范围在范围内,则它可以只是 col2 乘以 ave_val。

类型 开始 结束 col1 col2 封面 结果
A 66 752 97.384 686 101 984.698209
A 753 1435 97.23 682 682 6649.15028
A 1436 2120 97.522 684 684 6668.64925
A 2121 2805 97.522 684 684 6668.64925
A 8080 8972 97.522 892 892 1354.74636
A 8982 9312 97.376 330 330 501.195402

谢谢!

r merge range inner-join
2个回答
0
投票

哎呀,这行得通。

interval_join(df1, df2, by = c("start", "end"), mode = "inner") %>% 
  group_by(start.x, end.x) %>% 
  slice(1) %>%
  mutate(result = col2*ave_val)

现在,我只需要弄清楚如何制作封面栏


0
投票

首先,使用

vapply
计算
df1
df2
行上 pmin/pmax 的差异,然后在
matrixStats::rowMaxs
max.col
的帮助下进行数学计算。

> p <- vapply(seq_len(nrow(df2)), \(i) {
+   pmin(df1$end, df2[i, ]$end) - pmax(df1$start, df2[i, ]$start)
+ }, numeric(nrow(df1)))
> transform(df1, 
+           cover=(cover <- matrixStats::rowMaxs(p)),
+           result=df2$ave_val[max.col(p)]*cover)
  type start  end   col1 col2 cover    result
1    A    66  752 97.384  686   101  984.6982
2    A   753 1435 97.230  682   682 6649.1503
3    A  1436 2120 97.522  684   684 6668.6493
4    A  2121 2805 97.522  684   684 6668.6493
5    A  8080 8972 97.522  892   892 1354.7464
6    A  8982 9312 97.376  330   330  501.1954

数据:

> df1 |> dput()
structure(list(type = c("A", "A", "A", "A", "A", "A"), start = c(66, 
753, 1436, 2121, 8080, 8982), end = c(752, 1435, 2120, 2805, 
8972, 9312), col1 = c(97.384, 97.23, 97.522, 97.522, 97.522, 
97.376), col2 = c(686, 682, 684, 684, 892, 330)), class = "data.frame", row.names = c(NA, 
-6L))
> df2 |> dput()
structure(list(name = c("apple", "apple"), start = c(651, 8009
), end = c(7964, 9314), val = c(71298, 1982), ave_val = c(9.749487215, 
1.518773946)), class = "data.frame", row.names = c(NA, -2L))
© www.soinside.com 2019 - 2024. All rights reserved.