如何对一系列计数表应用卡方检验？

Question

我有这些数据集：

df

作为主要数据框（但让我们将它们想象为非常大的数据集）。

df = data.frame(x = seq(1,20,2),
y = c('a','a','b','c','a','a','b','c','a','a'),
z = c('d','e','e','d','f','e','e','d','e','f') )

stage1 = data.frame(xx = c(2,3,4,5,7,8,9) )

stage2 = data.frame(xx = c(3,5,7,8,9) )

stage3 = data.frame(xx = c(2,3,6,8) )

stage4 = data.frame(xx = c(1,3,6) )

然后创建计数表如下：

library(dplyr)
library(purrr)
map(lst(stage1 , stage2 ,stage3 ,stage4 ), 
   ~ inner_join(df, .x, by = c("x" = "xx")) %>%      
       count(y, name = 'Count'))

我希望应用卡方检验来研究每两个连续表之间的差异是否显着。

Answer 1

library(dplyr)
library(purrr)

df = data.frame(x = seq(1, 20, 2),
                y = c('a', 'a', 'b', 'c', 'a', 'a', 'b', 'c', 'a', 'a'),
                z = c('d', 'e', 'e', 'd', 'f', 'e', 'e', 'd', 'e', 'f') )
stage1 = data.frame(xx = c(2, 3, 4, 5, 7, 8, 9) )
stage2 = data.frame(xx = c(3, 5, 7, 8, 9) )
stage3 = data.frame(xx = c(2, 3, 6, 8))
stage4 = data.frame(xx = c(1, 3, 6))

tbls <- map(lst(stage1 , stage2 ,stage3 ,stage4 ), 
            ~ inner_join(df, .x, by = c("x" = "xx")) %>%      
              count(y, name = 'Count'))

results <- cbind(seq(1, length(tbls), by = 2),
                 seq(2, length(tbls), by = 2)) |> 
  apply(1, function(x) {
    result <- list(test_result = NA, 
                   table_idx = NA)
    result$table_idx <- c(x[ 1 ], x[ 2 ])
    test_result <- chisq.test(tbls[[ x[ 1 ] ]]$Count, 
                              tbls[[ x[ 2 ] ]]$Count, 
                              correct = FALSE) |>
      try()
    if ('try-error' %in% class(test_result)) {
      return(result)
    }
    result$test_result <- test_result 
    return(result)
  })

print(results)

Answer 2

我为您的最终列表输出指定了对象名称 (l)。然后我分成两个子列表并应用函数“map2”。这会将您请求的测试应用于两个列表中的元素对（即原始列表中的两个连续元素）：

l <- map(lst(stage1 , stage2 ,stage3 ,stage4 ), 
~ inner_join(df, .x, by = c("x" = "xx")) %>%      
  count(y, name = 'Count'))

将每两个连续元素拆分为两个子列表

is.odd <- rep(c(TRUE, FALSE), length = length(l))

l1<-l[is.odd]

l2<-l[!is.odd]

然后连接元素对并根据 l 重命名新元素：

l12<-map2(l1,l2, ~ left_join(.x,.y, by='y'))
names(l12)<-paste(names(l1),names(l2),sep = ' vs ')

对配对列表的每个元素执行卡方检验（每个元素包含来自“l”的两个连续表：

x.test<-map(l12, ~ chisq.test(.[-1]))

这是输出：

x.测试

$

stage1 vs stage2

Pearson's Chi-squared test

data:  X-squared = 0, df = 2, p-value = 1

$

stage3 vs stage4

Chi-squared test for given probabilities

 X-squared = 0.33333, df = 1, p-value = 0.5637

如何对一系列计数表应用卡方检验？

问题描述投票：0回答：2

2个回答

最新问题

如何对一系列计数表应用卡方检验？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2