如何使用虚拟变量运行相关性测试

问题描述 投票:0回答:1

我对使用 r 还很陌生,并且正在努力寻找一些方法来真正从一组数据中找到皮尔逊相关系数。我正在尝试分析作业收到的分数与所选主题领域(代数、微积分、几何等)之间是否存在相关性 这是我的数据框

sc.ar <- structure(list(area = structure(c(1L, 5L, 5L, 2L, 4L, 4L, 1L, 
6L, 1L, 2L, 1L, 3L, 3L, 5L, 2L, 2L, 2L, 3L, 4L, 4L, 5L, 1L, 2L, 
3L, 4L, 5L, 5L, 2L, 5L, 5L, 5L, 1L, 2L, 2L, 3L, 4L, 4L, 2L, 3L, 
4L, 4L, 5L, 5L, 2L, 3L, 4L, 4L, 4L, 5L), levels = c("Algebra", 
"Calculus", "Geometry", "Modelling", "Probability", "Other"), class = "factor"), 
    score = c(10, 10, 10, 11, 11, 11, 12, 12, 13, 13, 14, 14, 
    14, 14, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 
    17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 
    19, 20, 20, 20, 7, 9, 9)), class = "data.frame", row.names = c(NA, 
-49L))

抱歉,如果这还不够信息,这也是我第一次来这里。

我能够从

summary(lm(formula = score ~ area, data = sc.ar)) 
获得结果 但老实说我不知道该怎么处理它们。我的目标是找到一种通过手动输入虚拟变量来实现
>cor
的方法

r correlation dummy-variable
1个回答
1
投票

也许您想要

split
按地区,

> (df_s <- split(df$score, df$area))
$Algebra
[1] 10 12 13 14 16 17

$Calculus
 [1] 11 13 15 15 15 16 17 18 18 19 20

$Geometry
[1] 14 14 15 16 18 19 20

$Modelling
 [1] 11 11 15 15 16 18 18 19 19 20  7  9

$Probability
 [1] 10 10 14 15 16 16 17 17 17 19 19  9

$Other
[1] 12

但是这些区域的长度似乎不同。也许这只是因为你的玩具数据,你可以用最大的

lengths
来完成。

> (m <- sapply(df_s, `length<-`, max(lengths(df_s))))
      Algebra Calculus Geometry Modelling Probability Other
 [1,]      10       11       14        11          10    12
 [2,]      12       13       14        11          10    NA
 [3,]      13       15       15        15          14    NA
 [4,]      14       15       16        15          15    NA
 [5,]      16       15       18        16          16    NA
 [6,]      17       16       19        18          16    NA
 [7,]      NA       17       20        18          17    NA
 [8,]      NA       18       NA        19          17    NA
 [9,]      NA       18       NA        19          17    NA
[10,]      NA       19       NA        20          19    NA
[11,]      NA       20       NA         7          19    NA
[12,]      NA       NA       NA         9           9    NA

无论如何,最后只需在结果矩阵上应用

cor
即可。

> cor(m, use="pairwise.complete.obs")
              Algebra  Calculus  Geometry Modelling Probability Other
Algebra     1.0000000 0.9006049 0.9601136 0.9297804   0.9094441    NA
Calculus    0.9006049 1.0000000 0.8492236 0.2967773   0.9461672    NA
Geometry    0.9601136 0.8492236 1.0000000 0.9285061   0.8992441    NA
Modelling   0.9297804 0.2967773 0.9285061 1.0000000   0.5407100    NA
Probability 0.9094441 0.9461672 0.8992441 0.5407100   1.0000000    NA
Other              NA        NA        NA        NA          NA    NA

如果需要统计,可以使用

Hmisc::rcorr

> Hmisc::rcorr(m)
            Algebra Calculus Geometry Modelling Probability Other
Algebra        1.00     0.90     0.96      0.93        0.91    NA
Calculus       0.90     1.00     0.85      0.30        0.95    NA
Geometry       0.96     0.85     1.00      0.93        0.90    NA
Modelling      0.93     0.30     0.93      1.00        0.54    NA
Probability    0.91     0.95     0.90      0.54        1.00    NA
Other            NA       NA       NA        NA          NA     1

n
            Algebra Calculus Geometry Modelling Probability Other
Algebra           6        6        6         6           6     1
Calculus          6       11        7        11          11     1
Geometry          6        7        7         7           7     1
Modelling         6       11        7        12          12     1
Probability       6       11        7        12          12     1
Other             1        1        1         1           1     1

P
            Algebra Calculus Geometry Modelling Probability Other
Algebra             0.0143   0.0024   0.0072    0.0119           
Calculus    0.0143           0.0156   0.3755    0.0000           
Geometry    0.0024  0.0156            0.0025    0.0059           
Modelling   0.0072  0.3755   0.0025             0.0695           
Probability 0.0119  0.0000   0.0059   0.0695                     
Other                                                            
Warning message:
In sqrt(npair - 2) : NaNs produced

Pearson 在这两个方面都是默认的。

© www.soinside.com 2019 - 2024. All rights reserved.