我对使用 r 还很陌生,并且正在努力寻找一些方法来真正从一组数据中找到皮尔逊相关系数。我正在尝试分析作业收到的分数与所选主题领域(代数、微积分、几何等)之间是否存在相关性。 这是我的数据框
structure(list(area = structure(c(1L, 5L, 5L, 2L, 4L, 4L, 1L,
6L, 1L, 2L, 1L, 3L, 3L, 5L, 2L, 2L, 2L, 3L, 4L, 4L, 5L, 1L, 2L,
3L, 4L, 5L, 5L, 2L, 5L, 5L, 5L, 1L, 2L, 2L, 3L, 4L, 4L, 2L, 3L,
4L, 4L, 5L, 5L, 2L, 3L, 4L, 4L, 4L, 5L), levels = c("Algebra",
"Calculus", "Geometry", "Modelling", "Probability", "Other"), class = "factor"),
score = c(10, 10, 10, 11, 11, 11, 12, 12, 13, 13, 14, 14,
14, 14, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16,
17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19,
19, 20, 20, 20, 7, 9, 9)), class = "data.frame", row.names = c(NA,
-49L))
抱歉,如果这还不够信息,这也是我第一次来这里。
我能够从
> summary(lm(formula = score ~ area, data = sc.ar))
获得结果
但老实说我不知道该怎么处理它们。我的目标是找到一种通过手动输入虚拟变量来实现>cor
的方法
也许您想要
split
按地区,
> (df_s <- split(df$score, df$area))
$Algebra
[1] 10 12 13 14 16 17
$Calculus
[1] 11 13 15 15 15 16 17 18 18 19 20
$Geometry
[1] 14 14 15 16 18 19 20
$Modelling
[1] 11 11 15 15 16 18 18 19 19 20 7 9
$Probability
[1] 10 10 14 15 16 16 17 17 17 19 19 9
$Other
[1] 12
但是这些区域的长度似乎不同。也许这只是因为你的玩具数据,你可以用最大的
lengths
来完成。
> (m <- sapply(df_s, `length<-`, max(lengths(df_s))))
Algebra Calculus Geometry Modelling Probability Other
[1,] 10 11 14 11 10 12
[2,] 12 13 14 11 10 NA
[3,] 13 15 15 15 14 NA
[4,] 14 15 16 15 15 NA
[5,] 16 15 18 16 16 NA
[6,] 17 16 19 18 16 NA
[7,] NA 17 20 18 17 NA
[8,] NA 18 NA 19 17 NA
[9,] NA 18 NA 19 17 NA
[10,] NA 19 NA 20 19 NA
[11,] NA 20 NA 7 19 NA
[12,] NA NA NA 9 9 NA
无论如何,最后只需在结果矩阵上应用
cor
即可。
> cor(m, use="pairwise.complete.obs")
Algebra Calculus Geometry Modelling Probability Other
Algebra 1.0000000 0.9006049 0.9601136 0.9297804 0.9094441 NA
Calculus 0.9006049 1.0000000 0.8492236 0.2967773 0.9461672 NA
Geometry 0.9601136 0.8492236 1.0000000 0.9285061 0.8992441 NA
Modelling 0.9297804 0.2967773 0.9285061 1.0000000 0.5407100 NA
Probability 0.9094441 0.9461672 0.8992441 0.5407100 1.0000000 NA
Other NA NA NA NA NA NA