我试图在R上解决以下问题
我有一个像这样的data.frame
t(显然更大):
Column_1 Column_2 Column_3
(0-1] (15-25] 58
(2-3] (35-45] 25
(4-5] (35-45] 50
(0-1] (15-25] 5
(2-3] (25-35] 10
(1-2] (25-35] 15
(1-2] (15-25] 12
(3-4] (25-35] 10
(4-5] (35-45] 9
目标是从这个data.frame
构造一个矩阵,其中Column_1
作为列名,Column_2
作为行名,并且矩阵内部具有Column_3
中存在的每个值的平均值,与Column_1
和Column_2
中的相应值相关联。
结果矩阵应该是这样的:
(15-25] (25-35] (35-45]
(0-1] 31.5 0 0
(1-2] 12 15 0
(2-3] 0 10 25
(3-4] 0 10 0
(4-5] 0 0 29.5
我该怎么做?
xtabs()
和aggregate()
完成这项工作:
as.data.frame.matrix(xtabs(Column_3 ~ Column_1 + Column_2,
aggregate(Column_3 ~ Column_1 + Column_2, df, mean)))
# output
(15-25] (25-35] (35-45]
(0-1] 31.5 0 0.0
(1-2] 12.0 15 0.0
(2-3] 0.0 10 25.0
(3-4] 0.0 10 0.0
(4-5] 0.0 0 29.5
# data
df <- structure(list(Column_1 = c("(0-1]", "(2-3]", "(4-5]", "(0-1]",
"(2-3]", "(1-2]", "(1-2]", "(3-4]", "(4-5]"), Column_2 = c("(15-25]",
"(35-45]", "(35-45]", "(15-25]", "(25-35]", "(25-35]", "(15-25]",
"(25-35]", "(35-45]"), Column_3 = c(58L, 25L, 50L, 5L, 10L, 15L,
12L, 10L, 9L)), .Names = c("Column_1", "Column_2", "Column_3"
), class = "data.frame", row.names = c(NA, -9L))
我们可以使用dcast
的reshape2
。调用您的数据dd
:
wide = reshape2::dcast(data = dd, Column_1 ~ Column_2, fun.aggregate = mean, fill = 0)
wide
# Column_1 (15-25] (25-35] (35-45]
# 1 (0-1] 31.5 0 0.0
# 2 (1-2] 12.0 15 0.0
# 3 (2-3] 0.0 10 25.0
# 4 (3-4] 0.0 10 0.0
# 5 (4-5] 0.0 0 29.5
这是一个数据框,我们当然可以转换为矩阵:
mat = as.matrix(wide[, -1])
row.names(mat) = wide[, 1]
mat
# (15-25] (25-35] (35-45]
# (0-1] 31.5 0 0.0
# (1-2] 12.0 15 0.0
# (2-3] 0.0 10 25.0
# (3-4] 0.0 10 0.0
# (4-5] 0.0 0 29.5