我正在尝试使用强大的
avg_predictions()
包中的 hypothesis
函数和 marginaleffects
参数来估计组之间预测概率的差异。但是,当 by
参数中包含多个变量时,我在术语名称方面遇到了麻烦。我想使用术语名称而不是 b1
、b2
等来标识每个参数的位置,因为当我针对具有相同协变量列表的不同结果运行 avg_predictions()
时,位置不一致。下面,我将代码作为我正在尝试做的事情的示例。
当我运行这个时:
data("HealthInsurance", package = "AER")
mod <- glm(insurance ~ ethnicity*health + age + married + family + selfemp + region,
family = binomial(link = "logit"),
data = HealthInsurance)
avg_predictions(mod, by = c("ethnicity", "health"), type = "response")
我得到以下信息:
ethnicity health Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
cauc yes 0.816 0.00448 181.9 <0.001 Inf 0.807 0.824
afam yes 0.764 0.01292 59.2 <0.001 Inf 0.739 0.790
cauc no 0.725 0.01896 38.2 <0.001 Inf 0.688 0.762
afam no 0.727 0.04298 16.9 <0.001 210.9 0.643 0.812
other yes 0.752 0.02252 33.4 <0.001 808.6 0.707 0.796
other no 0.771 0.06839 11.3 <0.001 95.6 0.637 0.905
Columns: ethnicity, health, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
Type: response
例如,我想估计预测概率的以下差异:
avg_predictions(mod, by = c("ethnicity", "health"), type = "response",
hypothesis = c("b1-b2=0", "b1-b5=0", "b3-b4=0", "b3-b6=0"))
产生这个:
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
b1-b2=0 0.05134 0.0137 3.754 < 0.001 12.5 0.0245 0.0781
b1-b5=0 0.06406 0.0230 2.789 0.00528 7.6 0.0190 0.1091
b3-b4=0 -0.00202 0.0470 -0.043 0.96570 0.1 -0.0941 0.0901
b3-b6=0 -0.04618 0.0710 -0.651 0.51526 1.0 -0.1853 0.0929
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
Type: response
但是,我不想使用参数位置,而是想做这样的事情(这不起作用):
avg_predictions(mod, by = c("ethnicity", "health"), type = "response",
hypothesis = c("`cauc yes` - `afam yes` = 0"))
我从跑步中得到了术语名称:
avg_predictions(mod, by = c("ethnicity", "health"), type = "response") |> tidy()
产生这个:
# A tibble: 6 × 10
ethnicity health estimate std.error statistic p.value s.value conf.low conf.high term
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 cauc yes 0.816 0.00448 182. 0 Inf 0.807 0.824 cauc yes
2 afam yes 0.764 0.0129 59.2 0 Inf 0.739 0.790 afam yes
3 cauc no 0.725 0.0190 38.2 0 Inf 0.688 0.762 cauc no
4 afam no 0.727 0.0430 16.9 3.17e- 64 211. 0.643 0.812 afam no
5 other yes 0.752 0.0225 33.4 3.74e-244 809. 0.707 0.796 other yes
6 other no 0.771 0.0684 11.3 1.64e- 29 95.6 0.637 0.905 other no
不确定我可能做错了什么。如有任何帮助,我们将不胜感激!
目前没有内置的方法可以做你想做的事。
变得超级明确的一种方法是定义自定义对比函数。请参阅下面粘贴的示例和此处的文档:https://marginaleffects.com/bonus/hypothesis.html#functions
或者,我鼓励您查看
avg_comparisons()
函数。这会产生不同的数量,但它可能(或可能不是)实际上是您正在寻找的。
library(marginaleffects)
data("HealthInsurance", package = "AER")
mod <- glm(insurance ~ ethnicity * health + age + married + family + selfemp + region,
family = binomial(link = "logit"),
data = HealthInsurance)
hyp <- function(x) {
x$term <- paste(x$ethnicity, x$health)
val <- c(
x$estimate[x$term == "other yes"] - x$estimate[x$term == "other no"],
x$estimate[x$term == "afam yes"] - x$estimate[x$term == "afam no"]
)
lab <- c("(other yes) - (other no)", "(afam yes) - (afam no)")
out <- data.frame(term = lab, estimate = val)
return(out)
}
avg_predictions(mod, by = c("ethnicity", "health"), hypothesis = hyp)
#>
#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> (other yes) - (other no) -0.0199 0.0720 -0.277 0.782 0.4 -0.161 0.121
#> (afam yes) - (afam no) 0.0370 0.0449 0.823 0.410 1.3 -0.051 0.125
#>
#> Type: response
#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high