假设我有两个数据集的平均值,我想将它们绘制为条形图,误差线在 ggplot2 或 base 中彼此相邻
每个数据集都由数字矩阵组成
10 20 12
10 20 12
10 20 12
然后将其转换为例如 3 个元素的平均向量
10 20 12
我想要做的是获取两个均值向量并将它们绘制为条形图,其中一个的第一个元素位于另一个的第一个元素之外
Dataset1Element1Bar-Dataset2Element1Bar Dataset1Element2Bar-Dataset2Element2Bar etc
为每个条形提供一个误差条,例如标准差。我知道我可以通过 sd 计算它,但我不确定如何将其以正确的形式粘贴到图表中
最后根据元素编号(即元素 1)给它们着色
我有处理一个数据集的代码,但我不知道从哪里开始。
result<-barplot(bardata, main="Mean Coverage", names.arg=namePosTargetGroup, ylab="mean Magnitude", cex.names=.4,col=c("red","blue","green"))
legend(10,legend=c("Group1","Group2","Group3"),fill = c("red","blue","green"))
我查找的很多内容都给出了这一或另一件事的答案,但很难弄清楚如何将它们组合在一起。
我通常不建议只绘制带有误差线的条形图。还有许多其他方法可以绘制数据,可以更好地揭示数据及其结构。
特别是如果您的案例很少,则用条形图绘制方法并不好。可以在这里找到一个很好的解释:超越条形图和折线图:是时候采用新的数据呈现范式了
我发现很难给你一个好的解决方案,因为我不知道你的研究问题。知道您真正想要展示或强调的内容会让事情变得更容易。
我会给你两个建议,一个适合小数据集,一个适合大数据集。它们都是用
ggplot2
创建的。我不是按它们的“元素编号”而是按它们的来源(“数据集 1/2”)对它们进行着色,因为我发现通过这种方式更容易完成正确的图形。
使用
geom_jitter
显示所有案例,避免过度绘制。
# import hadleyverse
library(magrittr)
library(dplyr)
library(tidyr)
library(ggplot2)
# generate small amount of data
set.seed(1234)
df1 <- data.frame(v1 = rnorm(5, 4, 1),
v2 = rnorm(5, 5, 1),
v3 = rnorm(5, 6, 1),
origin = rep(factor("df1", levels = c("df1", "df2")), 5))
df2 <- data.frame(v1 = rnorm(5, 4.5, 1),
v2 = rnorm(5, 5.5, 1),
v3 = rnorm(5, 6.5, 1),
origin = rep(factor("df2", levels = c("df1", "df2")), 5))
# merge dataframes and gather in long format
pdata <- bind_rows(df1, df2) %>%
gather(id, variable, -origin)
# plot data
ggplot(pdata, aes(x = id, y = variable, fill = origin, colour = origin)) +
stat_summary(fun.y = mean, geom = "point", position = position_dodge(width = .5),
size = 30, shape = "-", show_guide = F, alpha = .7) + # plot mean as "-"
geom_jitter(position = position_jitterdodge(jitter.width = .3, jitter.height = .1,
dodge.width = .5),
size = 4, alpha = .85) +
labs(x = "Variable", y = NULL) + # adjust legend
theme_light() # nicer theme
如果您有更多数据点,可以使用
geom_violin
进行总结。
set.seed(12345)
df1 <- data.frame(v1 = rnorm(50, 4, 1),
v2 = rnorm(50, 5, 1),
v3 = rnorm(50, 6, 1),
origin = rep(factor("df1", levels = c("df1", "df2")), 50))
df2 <- data.frame(v1 = rnorm(50, 4.5, 1),
v2 = rnorm(50, 5.5, 1),
v3 = rnorm(50, 6.5, 1),
origin = rep(factor("df2", levels = c("df1", "df2")), 50))
# merge dataframes
pdata <- bind_rows(df1, df2) %>%
gather(id, variable, -origin)
# plot with violin plot
ggplot(pdata, aes(x = id, y = variable, fill = origin)) +
geom_violin(adjust = .6) +
stat_summary(fun.y = mean, geom = "point", position = position_dodge(width = .9),
size = 6, shape = 4, show_guide = F) +
guides(fill = guide_legend(override.aes = list(colour = NULL))) +
labs(x = "Variable", y = NULL) +
theme_light()
如果您坚持用标准差绘制平均值,可以按以下方法完成。
# merge dataframes and compute limits for sd
pdata <- bind_rows(df1, df2) %>%
gather(id, variable, -origin) %>%
group_by(origin, id) %>% # group data for limit calculation
mutate(upper = mean(variable) + sd(variable), # upper limit for error bar
lower = mean(variable) - sd(variable)) # lower limit for error bar
# plot
ggplot(pdata, aes(x = id, y = variable, fill = origin)) +
stat_summary(fun.y = mean, geom = "bar", position = position_dodge(width = .9),
size = 3) +
geom_errorbar(aes(ymin = lower, ymax = upper),
width = .2, # Width of the error bars
position = position_dodge(.9))
基于@ThomasK的答案,您可以使用专用于均值图的库来获取示例图,
superb
(带有误差线的摘要图)。假设您有上述答案中生成的数据框pdata
,您可以使用
superb( variable ~ id + origin, pdata)
superb( variable ~ id + origin, pdata, plotStyle="pointjitter")
superb( variable ~ id + origin, pdata, plotStyle="pointjitterviolin")
作为示例,这里是带有附加图形指令的 (c):
superb( variable ~ id + origin, pdata,
plotStyle = "pointjitterviolin",
errorbarParams = list(color="black"),
violinParams = list(color="black")
) + guides(fill = guide_legend(override.aes = list(colour = NULL))) +
labs(x = "Variable", y = NULL) +
theme_light()
您将获得: