离散变量和连续变量汇总表

问题描述 投票:0回答:1

如果这个问题重复或已经在其他地方问过,我深表歉意。

我喜欢用这种格式创建汇总表。

1) Discrete variables    : n/N (%)
2a) Continuous variables : mean (SD); N
2b) Continuous variables : median (IQR); N

例如,如果这是我的数据

# Example dataset
set.seed(123)
data <- data.frame(
  ChildSex = sample(c("Male", "Female"), 5006, replace = TRUE),
  col1 = rnorm(5006, mean = 300, sd = 100),
  col2 = rnorm(5006, mean = 400, sd = 150),
  col3 = rnorm(5006, mean = 470, sd = 200)
)

预期的摘要应该是这样的

Discrete Variables                                   
Child sex                                      
   Male                             2505/5006 (50%)
   Female                           2501/5006 (50%)
   Data missing                     0   /5006 (0%)

Continuous Variables: mean (SD); N
   Col1                            299.90 (99.38); 5006
   Col2                            399.12 (151.530); 5006
Continuous Variables: median (IQR); N
   Col3                            465.85 (268.15); 5006

我有大约 20 个离散变量和 30 个连续变量(18 个均值、标准差和 12 个中位数、IQR)。我喜欢创建如上所示的汇总表,而无需手动输入变量名称或级别。感谢您提前提供任何建议或建议..

r summary summarytools
1个回答
0
投票
set.seed(123)
data <- data.frame(
  ChildSex = c(sample(c("Male", "Female"), 5005, replace = TRUE), NA),
  col1 = rnorm(5006, mean = 300, sd = 100),
  col2 = rnorm(5006, mean = 400, sd = 150),
  col3 = rnorm(5006, mean = 470, sd = 200)
)

data

    tbl_summary(data,
                type=list(col1="continuous2",
                          col2="continuous2",
                          col3="continuous2"),
                statistic = list(c(col1,col2) ~ "{mean} ({sd})", 
                                 col3 ~ "{median} ({p25}-{p75})",
                                 all_categorical() ~"{n}/{N_obs} ({p}%)"),
                missing="ifany",
                missing_text = "Data missing",
                missing_stat = "{N_miss} / {N_obs} ({p_miss}%))")

给予

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.