我正在尝试创建一个比较中西部各州收入和国民收入的箱线图。我不太确定该怎么做。到目前为止,数据已被过滤,但我不确定如何使用 GGplot 创建箱线图,因为数据已分为两个数据集。
#Opening the csv file and naming it as "zoo."
museums <- read.csv("C:\\Users\\Elysa Ng\\Downloads\\archive\\museums.csv")
#Loading the libraries.
library(tidyverse)library(ggResidpanel)library(emmeans)library(car)library(dplyr)library(ggpubr)library(rstatix)
#Glimpsing the data.
glimpse(museums)
#NA omit from the income column
income <- museums %>%
drop_na(Income)
#plot income data distribution
income %>%
ggplot(aes(x=Income))+
geom_histogram()
这里是中西部各州的过滤数据。现在有没有一种方法可以将此处汇总的所有数据组合成一个 ggplot,其中的箱线图描绘了中西部与全国的数据?
#filter out just the organizations in the Midwest
midwest <- income %>%filter(State..Administrative.Location. %in% c("IN", "IL", "MI", "OH", "MN", "WI", "IA", "MO"))
#looks at how many incomes from the new filtered data are 0 sum(midwest$Income=="0")
#filter out incomes equal to 0 midwest.filtered <- subset(midwest, as.numeric(Income)!=0)
#look at the summary data for non-zero incomes of all Midwest organizations
summarize(midwest.filtered, mean=mean(Income), sd=sd(Income), median=median(Income))
#filter out incomes greater than twice the median
midwest.filtered <- subset(midwest.filtered, as.numeric(Income) <= 337132)
#summary data for filtered zoo income nationally
summarize(midwest.filtered, mean=mean(Income), sd=sd(Income), median=median(Income))