我正在尝试学习编写函数并探索制作一个函数来进行方差分析和后 F 测试。我已将其简化为获取 emmeans 并关联所有成对比较的问题。问题是成对比较不起作用???我对功能真的很陌生,并且尝试了很多方法来解决这个问题但无济于事......
我正在使用钻石数据集来提高再现性......
这是代码:
# libraries
library(tidyverse)
library(car)
library(emmeans)
library(readxl)
library(rlang)
library(broom)
dia <- diamonds
dia$cut <- as.factor(as.character(dia$cut))
results_LM <- function(data, iv, dv) {
iv_name <- as.character(substitute(iv))
dv_name <- as.character(substitute(dv))
formula_str <- paste(dv_name, "~", iv_name)
aov.model <- lm(formula_str, data = data)
emm.means <- emmeans(aov.model,
specs = iv_name,
by = iv_name)
emm.pairs <- pairs(emm.means)
output <- list(
aov_model_summary = summary(aov.model),
emm.means,
emm.pairs)
return(output)
}
results_LM(dia, cut, price)
这个问题是 emmeans 输出看起来很奇怪并且缺少成对比较
[[3]]
cut = Fair:
contrast estimate SE df z.ratio p.value
(nothing) nonEst NA NA NA NA
cut = Good:
contrast estimate SE df z.ratio p.value
(nothing) nonEst NA NA NA NA
cut = Ideal:
contrast estimate SE df z.ratio p.value
(nothing) nonEst NA NA NA NA
cut = Premium:
contrast estimate SE df z.ratio p.value
(nothing) nonEst NA NA NA NA
cut = Very Good:
contrast estimate SE df z.ratio p.value
(nothing) nonEst NA NA NA NA```
Note that when I do the anova and emmeans post F test outside of the function it works fine?
Any help is appreciated and pointers to learn to write functions like this on dataframes would be GREAT. Thanks Bill
你们真的很亲密。我认为唯一的事情是你不需要在调用
by
时使用 emmeans()
参数。我可能还建议使用 reformulate()
来制作公式。见下图:
library(tidyverse)
library(emmeans)
dia <- diamonds
dia$cut <- as.factor(as.character(dia$cut))
results_LM <- function(data, iv, dv) {
iv_name <- as.character(substitute(iv))
dv_name <- as.character(substitute(dv))
formula_str <- reformulate(iv_name, response=dv_name)
aov.model <- lm(formula_str, data = data)
emm.means <- emmeans(aov.model,
specs = iv_name)
emm.pairs <- pairs(emm.means)
output <- list(
aov_model_summary = summary(aov.model),
emm.means,
emm.pairs)
names(output) <- c("Model", "Marginal Means", "Pairwise Comparisons")
return(output)
}
results_LM(dia, cut, price)
#> $Model
#>
#> Call:
#> lm(formula = formula_str, data = data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -4258 -2741 -1494 1360 15348
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 4358.76 98.79 44.122 < 2e-16 ***
#> cutGood -429.89 113.85 -3.776 0.000160 ***
#> cutIdeal -901.22 102.41 -8.800 < 2e-16 ***
#> cutPremium 225.50 104.40 2.160 0.030772 *
#> cutVery Good -377.00 105.16 -3.585 0.000338 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3964 on 53935 degrees of freedom
#> Multiple R-squared: 0.01286, Adjusted R-squared: 0.01279
#> F-statistic: 175.7 on 4 and 53935 DF, p-value: < 2.2e-16
#>
#>
#> $`Marginal Means`
#> cut emmean SE df lower.CL upper.CL
#> Fair 4359 98.8 53935 4165 4552
#> Good 3929 56.6 53935 3818 4040
#> Ideal 3458 27.0 53935 3405 3510
#> Premium 4584 33.8 53935 4518 4650
#> Very Good 3982 36.1 53935 3911 4052
#>
#> Confidence level used: 0.95
#>
#> $`Pairwise Comparisons`
#> contrast estimate SE df t.ratio p.value
#> Fair - Good 429.9 113.8 53935 3.776 0.0015
#> Fair - Ideal 901.2 102.4 53935 8.800 <.0001
#> Fair - Premium -225.5 104.4 53935 -2.160 0.1950
#> Fair - Very Good 377.0 105.2 53935 3.585 0.0031
#> Good - Ideal 471.3 62.7 53935 7.517 <.0001
#> Good - Premium -655.4 65.9 53935 -9.946 <.0001
#> Good - Very Good -52.9 67.1 53935 -0.788 0.9341
#> Ideal - Premium -1126.7 43.2 53935 -26.067 <.0001
#> Ideal - Very Good -524.2 45.1 53935 -11.636 <.0001
#> Premium - Very Good 602.5 49.4 53935 12.198 <.0001
#>
#> P value adjustment: tukey method for comparing a family of 5 estimates
创建于 2023-08-09,使用 reprex v2.0.2