我正在使用 R 进行非线性回归分析,并遇到了误差传播问题。具体来说,我使用
predictNLS
来估计我的预测的错误,但在某些情况下,错误范围超出了生物学上合理的阈值(例如,预测间隔高于 1 或低于 0,这在我的生物学中没有意义)上下文)。
下面是我的代码的可重现示例。它模拟指数关系,拟合非线性最小二乘 (NLS) 模型,然后使用
predictNLS
估计预测误差。我的挑战是处理错误传播导致生物学上不可能的值的情况。
可重现示例:
# simulate exponential relationship
set.seed(123)
# generate random x values between 0 and 60
x <- runif(100, 0, 60)
y <- 1 - exp(-0.075 * x) * rnorm(100, 0.7, 0.1)
data = data.frame(sr= x, fipar = y)
# create a nls model to fit the data
model <- nls(fipar ~ 1 - exp(-a * sr), data = data, start = list(a = 0.001))
# create an observed and predicted dataframe
data$predicted <- predict(model, data)
library(ggplot2)
data %>%
ggplot(aes(x = sr, y = fipar)) +
geom_point() +
geom_line(aes(y = predicted), color = "red")
# estimate the errors using predictNLS
newdat = data.frame(sr = seq(1, 60, 1))
prediction_se <- predictNLS(model, newdata = newdat, interval = "prediction", type = 'response')
prediction_se$summary$sr <- newdat$sr
prediction_se$summary %>%
ggplot(aes(x = sr, y = Prop.Mean.1)) +
ylim(0, 1.2) +
geom_point() +
geom_ribbon(aes(ymin = `Prop.2.5%`, ymax = `Prop.97.5%`), alpha = 0.2) +
geom_hline(yintercept = 1)
我对此的自然反应是简单地将任何大于 1 的错误值替换为 1,但我猜这忽略了一堆统计假设。是否有任何推荐的方法来限制预测间隔或调整误差估计以反映生物约束?也许我的问题在统计堆栈交换中会更好,但我想我也会在这里寻找答案..
使用 logit 链接函数的 glm 有效:
# Fit the GLM model with a custom link function
model <- glm(fipar ~ sr,
data = data,
family = binomial(link = "logit"))
# Create an observed and predicted dataframe
data$predicted <- predict(model, data, type = "response",se.fit = T)$fit
data$se <- predict(model, data, type = "response",se.fit = T)$se.fit
# Plot the data and the model's predicted probabilities
library(ggplot2)
ggplot(data, aes(x = sr, y = fipar)) +
geom_point() +
geom_line(aes(y = predicted), color = "red")+
geom_ribbon(aes(ymin = predicted - se, ymax = predicted + se), alpha = 0.2)