我正在尝试模拟具有不同程度异方差的简单回归模型。我想做一个简单的模拟,以显示增加的异方差如何影响模型的各个部分(系数与标准误差等)。这是到目前为止的代码:
N <- 1e4
x <- rnorm(N, 10, 1)
results <- data.frame(i = numeric(),
beta = numeric(),
se = numeric(),
var_y = numeric())
for (i in 1:100) {
y <- rnorm(N,.7*x, i/x)
fit <- lm(y ~ x)
r <- c(i = i, beta = coef(fit)[[2]], se = summary(fit)$coefficients[, 2][[2]], var_y = var(y))
results <- rbind(results, r)
}
colnames(results) <- c("i", "beta", "se", "var_y")
plot(results$beta)
plot(results$se)
plot(results$var_y)
问题是我担心我不仅引入了异方差性,而且还增加了 y 的整体方差,这本身就会影响模型结果。
有没有什么方法可以在不改变其他任何东西的情况下增加异方差的量? 我是否想得太多并在模型内标准化 y (所以运行
lm(scale(y) ~ x)
)就足够了?
也许像这样?
n <- 1e4
set.seed(42)
x <- runif(n, 0, 100)
sd <- 0.1
y1 <- 10 + 0.1 * x + rnorm(n, sd = sd)
var(y1)
#[1] 8.458331
summary(mod1 <- lm(y1 ~ x))
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 1.000e+01 1.996e-03 5010 <2e-16 ***
# x 9.998e-02 3.456e-05 2892 <2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.1005 on 9998 degrees of freedom
const <- 0.5
power <- 2
#this is the function from nlme::varConstPower, other functions are possible, e.g., help("varClasses", package = "nlme)
var2 <- (const + abs(x)^power)^2
var2 <- var2/mean(var2) * sd^2
y2 <- 10 + 0.1 * x + rnorm(n, sd = sqrt(var2))
var(y2)
#[1] 8.46444
#Note that this derived from the sample.
#You should care more about the true variance (or the residual standard error below)
summary(mod20 <- lm(y2 ~ x))
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 1.000e+01 2.006e-03 4984 <2e-16 ***
# x 1.000e-01 3.474e-05 2879 <2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.101 on 9998 degrees of freedom
library(nlme)
summary(mod21 <- gls(y2 ~ x, weights = varConstPower(form = ~ x)))
# const power
# 0.4123067 1.9807457
#
# Coefficients:
# Value Std.Error t-value p-value
# (Intercept) 10.000000 1.750794e-06 5711695 0
# x 0.100003 2.755056e-06 36298 0