我想制作一个看起来与此相似的histogram
我有以下代码
d1 <- read.table("Session_data_TU2010AND15.csv", header = TRUE, sep = ";")
d <- d1[,c("IncHouseh","HousehNumcars")]
第一个变量IncHouseh
是不同家庭的收入。这些应在x轴上以间隔显示,而HousehNumcars
(家庭中的汽车数量)应为每个间隔在栏中显示的百分比。
数据d看起来像这样,但是有超过20000行:
IncHouseh HousehNumcars
1 800 2
2 384 2
4 638 1
5 580 2
6 700 2
7 744 2
8 560 1
9 500 1
10 686 1
11 310 1
12 510 1
13 648 2
14 372 1
15 542 1
由于我是r的新手,我发现很难说明类似于上面提供的链接的内容。感谢您的帮助!
您可以首先使用cut
对收入数据进行分类。
dat$IncHouseh.c=cut(dat$IncHouseh, seq(1e3, 5e3, 1e3),
labels=c("10k-20k", "20k-30k", "30k-40k", "40k-50k")))
然后,为汇总汽车数量的百分比,您可以在prop.table(table(x)))
中使用tapply
。
agg <- do.call(rbind, with(dat, tapply(HousehNumcars, IncHouseh.c, FUN=function(x)
prop.table(table(x)))))
第三,绘制它!
op <- par(mar=c(5, 5, 4, 6), xpd=TRUE) ## expand outer margins
b <- barplot(agg, xaxt="n", col=2:5, ## assign position output to `b`
xlab="Income", ylab="Probability", main="Cars in households")
mtext(rownames(agg), 1, 1, at=b) ## use `b` for label posotioning
legend(5, 1, title="cars", col=5:2, pch=15, legend=3:0) ## legend
par(op)
数据:
set.seed(42)
dat <- data.frame(IncHouseh=sample(1e3:5e3, 2e3, replace=T),
HousehNumcars=sample(0:3, 2e3, replace=TRUE))
这里是另一种方法,使用著名的dplyr进行数据处理,并使用ggplot绘制图形。 pacakge magrittr
用于管道%>%
构造。
STEP1
读取数据并将其构造为名为dataframe的df
。请记住,使用stringsAsFactors = F
可以使列成为factor
以外的任何类型,以便在后续步骤中更轻松地进行数据操作。
library(dplyr); library(magrittr); library(ggplot2)
d1 <- read.table(text = "IncHouseh HousehNumcars
1 800 2
2 384 2
4 638 1
5 580 2
6 700 2
7 744 2
8 560 1
9 500 1
10 686 1
11 310 1
12 510 1
13 648 2
14 372 1
15 542 1", header =T)
df <- data.frame(d1, stringsAsFactors = F)
STEP2
使用mutate
(添加适合绘制的新列),case_when
(创建if-else
构造)
df <- df %>% mutate(x_labels = case_when(IncHouseh <= 100 & IncHouseh > 0 ~ "under100",
IncHouseh <= 200 & IncHouseh > 100 ~ "100-200",
IncHouseh <= 300 & IncHouseh > 200 ~ "200-300",
IncHouseh <= 400 & IncHouseh > 300 ~ "300-400",
IncHouseh <= 500 & IncHouseh > 400 ~ "400-500",
IncHouseh <= 600 & IncHouseh > 500 ~ "500-600",
IncHouseh <= 700 & IncHouseh > 600 ~ "600-700",
IncHouseh <= 800 & IncHouseh > 700 ~ "700-800",
IncHouseh <= 900 & IncHouseh > 800 ~ "800-900",
IncHouseh <= 1000 & IncHouseh > 900 ~ "900-1000"))
df <- df %>% group_by(x_labels) %>% mutate(Probability = (HousehNumcars/sum(HousehNumcars)
*100), Cars = as.factor(HousehNumcars))
STEP3
绘制!
plot <- df %>% ggplot(aes(x = x_labels, y = Probability, fill = Cars)) + geom_col()
#some codes for beautification, but not necessary
plot + ylab("Probability or number of cars (%)") + xlab("Range of income") +
ggtitle("Number of cars according to houshold income") +
theme(plot.title = element_text(hjust = 0.5))