当我按 stby 分组时,摘要工具给了我两个不同的答案

问题描述 投票:0回答:1

我在汇总工具中使用 stby 来按组计算加权描述性统计数据。然而,当我这样做时,与我通过分组变量进行过滤然后在摘要工具中应用 descr 函数时相比,我得到了不同的答案。请参阅下文 - mydf = 我的未过滤数据框,分数是一个 0-10 的变量,我想获取其平均值。

##when I filter first and split my df
filtered_male <- mydf$gender %>% filter(gender==1)
with(filtered_male, stby(score, gender, descr, weights = weight))
Weighted Descriptive Statistics  
score by gender  
Data Frame: filtered_male  
Weights: weight  
N: 838  

                           1
--------------- ------------
           Mean         6.86
        Std.Dev         2.93
            Min         0.00
         Median         8.00
            Max        10.00
            MAD         2.97
             CV         0.43
        N.Valid   1509584.07
      Pct.Valid        99.70

##when I don't split my df
with(mydf, stby(score, gender, descr, weights = weight, simplify = TRUE))
Weighted Descriptive Statistics  
score by gender  
Data Frame: mydf 
Weights: weight  
N: 838  

                           1            2
--------------- ------------ ------------
           Mean         7.01         6.79
        Std.Dev         2.81         3.02
            Min         0.00         0.00
         Median         8.00         8.00
            Max        10.00        10.00
            MAD         2.97         2.97
             CV         0.40         0.45
        N.Valid   1715494.12   1379339.65
      Pct.Valid        56.05        45.07

'''

关于为什么会发生这种情况或者我如何解决它以获得正确的加权平均值有什么想法吗? (我已经手动检查答案,并且我首先过滤的平均值是正确的)

r grouping mean summarytools
1个回答
0
投票

同时,官方对此进行了修复,您可以使用以下命令生成有效的 stbyobject:

### Packages
library(dplyr)
library(purrr)
library(summarytools)

### Data
mtcars

### Output with summarytools
st=with(mtcars, stby(qsec, cyl,descr, weights = wt,simplify = TRUE))

### Fix the output with corrected values
mtcars %>%
  group_by(cyl) %>%
  group_map(~ descr(.x$qsec,descr, weights = .x$wt)) %>% 
  walk2(.y = 1:length(.),function(x,y){st[[y]][,]<<-.[[y]][,]})

### Bonus, add missing N number for each group
attributes(st[[1]])$data_info$N.Obs<-paste(map_int(1:length(st),~attributes(st[[.x]])$data_info$N.Obs),collapse = ",")

输出:

Weighted Descriptive Statistics  
qsec by cyl  
Data Frame: mtcars  
Weights: wt  
N: 11,7,14  

                       4        6        8
--------------- -------- -------- --------
           Mean    19.38    18.12    16.89
        Std.Dev     1.72     1.59     1.13
            Min    16.70    15.50    14.50
         Median    19.24    18.46    17.34
            Max    22.90    20.22    18.00
            MAD     1.09     2.00     0.71
             CV     0.09     0.09     0.07
        N.Valid    25.14    21.82    55.99
      Pct.Valid   100.00   100.00   100.00
© www.soinside.com 2019 - 2024. All rights reserved.