我有两个数据帧(fire2022_12 和 fire2021_12),具有相同的三列(变量、x、y)。我粘贴了下面这些数据框顶部的一个片段。我可以通过这样做在同一个图中很好地绘制这两个数据帧:
ggplot() + geom_line(data=fire2022_12, aes(x, y)) + geom_line(data=fire2021_12, aes(x, y))
我现在想创建一条线来表示这两个数据帧的平均值,这样我就可以平均看到“火灾发生后的时间”(x) 如何影响栖息地适宜性 (y (0-1))一种物种。但是,我遇到了麻烦。我认为引起问题的原因是两个数据帧之间的 x 轴值之间没有完美的重叠,因此如果 x 轴值仅出现在一个数据帧上,它将采用该数据帧而不是忽略它(或与平均值/平均“曲线”的其余部分对齐)。
火2022_12
timesincefire202212v2,-115.1,0.8304737597703934
timesincefire202212v2,-113.71879999999999,0.8304737597703934
timesincefire202212v2,-112.3376,0.8304737597703934
timesincefire202212v2,-110.9564,0.8304737597703934
timesincefire202212v2,-109.5752,0.8304737597703934
timesincefire202212v2,-108.19399999999999,0.8304737597703934
timesincefire202212v2,-106.8128,0.8304737597703934
timesincefire202212v2,-105.4316,0.8304737597703934
timesincefire202212v2,-104.0504,0.8304737597703934
timesincefire202212v2,-102.66919999999999,0.8304737597703934
timesincefire202212v2,-101.288,0.8304737597703934
timesincefire202212v2,-99.9068,0.8304737597703934
timesincefire202212v2,-98.5256,0.8304737597703934
timesincefire202212v2,-97.14439999999999,0.8304737597703934
timesincefire202212v2,-95.7632,0.8304737597703934
timesincefire202212v2,-94.382,0.8304737597703934
timesincefire202212v2,-93.0008,0.8304737597703934
timesincefire202212v2,-91.61959999999999,0.8304737597703934
timesincefire202212v2,-90.2384,0.8304737597703934
火2021_12
timesincefire2021_12,-113.9,0.9661756336688996
timesincefire2021_12,-112.53320000000001,0.9661756336688996
timesincefire2021_12,-111.16640000000001,0.9661756336688996
timesincefire2021_12,-109.7996,0.9661756336688996
timesincefire2021_12,-108.4328,0.9661756336688996
timesincefire2021_12,-107.066,0.9661756336688996
timesincefire2021_12,-105.6992,0.9661756336688996
timesincefire2021_12,-104.3324,0.9661756336688996
timesincefire2021_12,-102.96560000000001,0.9661756336688996
timesincefire2021_12,-101.59880000000001,0.9661756336688996
timesincefire2021_12,-100.232,0.9661756336688996
timesincefire2021_12,-98.8652,0.9661756336688996
timesincefire2021_12,-97.4984,0.9661756336688996
timesincefire2021_12,-96.1316,0.9661756336688996
timesincefire2021_12,-94.76480000000001,0.9661756336688996
timesincefire2021_12,-93.398,0.9661756336688996
timesincefire2021_12,-92.0312,0.9661756336688996
timesincefire2021_12,-90.6644,0.9661756336688996
我最接近期望的结果是:
library(ggplot2)
library(tidyverse)
library(dplyr)
#merging two dataframes into one
data2 <- rbind(fire2022_12, fire2021_12)
#rounding the x-axis values to a whole number
data2$x <- round(data2$x, 0)
#obtaining the mean y-axis value for each x-axis value
grouped <- data2 %>% group_by(x) %>% summarise(y = mean(y))
#plotting the data
ggplot(data=grouped, aes(x=x, y=y))+geom_point()
当我绘制数据时,看起来有 3 条线而不是 1 条。我认为发生这种情况是因为对于某些 x 轴值,其中一个数据帧只有 y 轴值,而不是两者都有。例如,x 轴值 -115 仅出现在 fire2022_12 上,而不出现在 fire2021_12 上,因此在计算平均值时,仅采用 fire2022_12 的值。如果我能以某种方式忽略这些异常值,或者强制它们与两个数据帧中都有 y 轴值的 x 轴值对齐,那就太好了。
我在下面添加了一些屏幕截图。 屏幕截图 1:我的结果 我的结果
屏幕截图 2:在我尝试创建“均值/平均值”线之前,原始绘制的数据框是什么样子* 屏幕截图 2
屏幕截图 3:当我将 geom 设置为“线”而不是“点”时得到的结果屏幕截图 3
提前谢谢您! :)
注意:我也尝试过使用 ggplot2 的 stat_summary() 但遇到了类似的问题。
解决此问题的一种方法是使用
as.integer
获取两个数据集之间共享的 x 值,然后按行生成 y 平均值。
library(dplyr)
library(ggplot2)
inner_join(df1 %>%
mutate(intx = as.integer(x)),
df2 %>%
mutate(intx = as.integer(x)), "intx") %>%
rowwise() %>%
mutate(Mean = mean(c(y.x, y.y))) %>%
ggplot() +
geom_line(aes(intx, Mean, col="Mean"))
请注意,
inner_join
仅保留两个帧中都存在的 x 值。首先将 x 除以 10,然后使用例如make.unique
df1 <- structure(list(name = c("fire2022_12", "fire2022_12", "fire2022_12",
"fire2022_12", "fire2022_12", "fire2022_12", "fire2022_12", "fire2022_12",
"fire2022_12", "fire2022_12", "fire2022_12", "fire2022_12", "fire2022_12",
"fire2022_12", "fire2022_12", "fire2022_12", "fire2022_12", "fire2022_12"
), x = c(-115.1, -113.71879, -112.3376, -110.9564, -109.5752,
-108.1939, -106.8128, -105.4316, -104.0504, -102.66919, -101.288,
-99.9068, -98.5256, -97.14439, -95.7632, -94.382, -93.0008, -91.61959
), y = c(0.830473759770393, 0.830473759770393, 0.830473759770393,
0.830473759770393, 0.830473759770393, 0.830473759770393, 0.830473759770393,
0.830473759770393, 0.830473759770393, 0.830473759770393, 0.830473759770393,
0.830473759770393, 0.830473759770393, 0.830473759770393, 0.830473759770393,
0.830473759770393, 0.830473759770393, 0.830473759770393)), class =
"data.frame", row.names = c(NA, -18L))
df2 <- structure(list(name = c("fire2021_12", "fire2021_12", "fire2021_12",
"fire2021_12", "fire2021_12", "fire2021_12", "fire2021_12", "fire2021_12",
"fire2021_12", "fire2021_12", "fire2021_12", "fire2021_12", "fire2021_12",
"fire2021_12", "fire2021_12", "fire2021_12", "fire2021_12", "fire2021_12"
), x = c(-113.9, -112.5332, -111.1664, -109.7996, -108.4328,
-107.066, -105.6992, -104.3324, -102.9656, -101.5988, -100.232,
-98.8652, -97.4984, -96.1316, -94.7648, -93.398, -92.0312, -90.6644
), y = c(0.9661756336689, 0.9661756336689, 0.9661756336689, 0.9661756336689,
0.9661756336689, 0.9661756336689, 0.9661756336689, 0.9661756336689,
0.9661756336689, 0.9661756336689, 0.9661756336689, 0.9661756336689,
0.9661756336689, 0.9661756336689, 0.9661756336689, 0.9661756336689,
0.9661756336689, 0.9661756336689)), class = "data.frame", row.names = c(NA,
-18L))