我的大脑没有动脑子,我不知道如何合并一个范围内的两个数据帧。我希望你能帮助我。
df1 <- data.frame(
type = rep("A", 6),
start = c(66, 753, 1436, 2121, 8080, 8982),
end = c(752, 1435, 2120, 2805, 8972, 9312),
col1 = c(97.384, 97.23, 97.522, 97.522, 97.522, 97.376),
col2 = c(686, 682, 684, 684, 892, 330)
)
df2 <- data.frame(
name = c("apple", "apple"),
start = c(651, 8009),
end = c(7964, 9314),
val = c(71298, 1982),
ave_val = c(9.749487215, 1.518773946)
)
这就是我想要得到的结果,它是一个结果列,当它在范围内时,将 cover 与 ave_val 相乘。或者,如果范围在范围内,则它可以只是 col2 乘以 ave_val。
类型 | 开始 | 结束 | col1 | col2 | 封面 | 结果 |
---|---|---|---|---|---|---|
A | 66 | 752 | 97.384 | 686 | 101 | 984.698209 |
A | 753 | 1435 | 97.23 | 682 | 682 | 6649.15028 |
A | 1436 | 2120 | 97.522 | 684 | 684 | 6668.64925 |
A | 2121 | 2805 | 97.522 | 684 | 684 | 6668.64925 |
A | 8080 | 8972 | 97.522 | 892 | 892 | 1354.74636 |
A | 8982 | 9312 | 97.376 | 330 | 330 | 501.195402 |
谢谢!
哎呀,这行得通。
interval_join(df1, df2, by = c("start", "end"), mode = "inner") %>%
group_by(start.x, end.x) %>%
slice(1) %>%
mutate(result = col2*ave_val)
现在,我只需要弄清楚如何制作封面栏
首先,使用
vapply
计算 df1
df2行上
pmin
/pmax
的差异,然后在 matrixStats::rowMaxs
和 max.col
的帮助下进行数学计算。
> p <- vapply(seq_len(nrow(df2)), \(i) {
+ pmin(df1$end, df2[i, ]$end) - pmax(df1$start, df2[i, ]$start)
+ }, numeric(nrow(df1)))
> transform(df1,
+ cover=(cover <- matrixStats::rowMaxs(p)),
+ result=df2$ave_val[max.col(p)]*cover)
type start end col1 col2 cover result
1 A 66 752 97.384 686 101 984.6982
2 A 753 1435 97.230 682 682 6649.1503
3 A 1436 2120 97.522 684 684 6668.6493
4 A 2121 2805 97.522 684 684 6668.6493
5 A 8080 8972 97.522 892 892 1354.7464
6 A 8982 9312 97.376 330 330 501.1954
数据:
> df1 |> dput()
structure(list(type = c("A", "A", "A", "A", "A", "A"), start = c(66,
753, 1436, 2121, 8080, 8982), end = c(752, 1435, 2120, 2805,
8972, 9312), col1 = c(97.384, 97.23, 97.522, 97.522, 97.522,
97.376), col2 = c(686, 682, 684, 684, 892, 330)), class = "data.frame", row.names = c(NA,
-6L))
> df2 |> dput()
structure(list(name = c("apple", "apple"), start = c(651, 8009
), end = c(7964, 9314), val = c(71298, 1982), ave_val = c(9.749487215,
1.518773946)), class = "data.frame", row.names = c(NA, -2L))