如何在 R 中使用 mlr3 包创建的逻辑回归模型中计算残差?

问题描述 投票:0回答:1

我使用 R 中的 mlr3 包创建了一个逻辑回归模型。我从模型中输出了残差,但我无法弄清楚它们是如何计算的 - 它们与我所知道的任何残差计算都不对应。

假设我使用 mlr3 包创建逻辑回归模型:

library(mlr3)
library(tidyverse)

#create dummy data
data <- data.frame(
  predictor = c(rnorm(50, mean = 0), rnorm(50, mean = 1)),
  dependant = as.factor(c(rep(0,50), rep(1, 50)))
)

#define and train a logistic regression model
classifier_log_reg <-  mlr_learners$get("classif.log_reg")
task <- mlr3::TaskClassif$new(id = "my_data",
                                     backend = data,
                                     target = "dependant", # target variable
                                     positive = "1")
classifier_log_reg$train(task, row_ids = 1:100)

我可以使用以下方法从模型中获取残差:

residuals <- classifier_log_reg$model$residuals

我的问题是:这些残差是如何计算的?我无法手动重现它们。它们与我使用以下函数计算皮尔逊残差或偏差残差时得到的数字不匹配:

pearson_residuals <- function(p, actual) {
  # Standard deviation of the predicted binomial distribution
  std_dev <- sqrt(p * (1 - p))
  
  # Avoid division by zero in case of p values being 0 or 1
  std_dev[std_dev == 0] <- .Machine$double.eps
  
  # Calculate the Pearson residuals
  residuals <- (actual - p) / std_dev
  
  return(residuals)
}

deviance_residuals <- function(p, actual) {
  # Ensure p is within valid range to avoid log(0) issues
  p <- ifelse(p == 0, .Machine$double.eps, ifelse(p == 1, 1 - .Machine$double.eps, p))
  
  # Calculate the deviance residuals
  residuals <- sign(actual - p) * sqrt(-2 * (actual * log(p) + (1 - actual) * log(1 - p)))
  
  return(residuals)
}
奇怪的是,我发现

classifier_log_reg$model$residuals

 的残差似乎系统地对应于我可以手动计算的残差,即因变量的预测值和实际值之间的简单差异。请注意,我已经调整了手动计算和模型对象输出的残差,以最好地说明明显的 sigmoid 关系:

#get residuals directly from the model object residuals <- classifier_log_reg$model$residuals #####calculate residuals manually #specify that predictions should be continuous classifier_log_reg$predict_type <- 'prob' #get the predictions predictions <- classifier_log_reg$predict(task, row_ids = 1:100) #isolate the vector containing the predictions predictions <- predictions$data$prob %>% as.data.frame() %>% pull(1) #subtract predictions from actual values of dependant variable actual <- data$dependant %>% as.character() %>% as.numeric() my_resid <- actual - predictions #put the residuals from the model, the manually calculated residuals #and the actual values into a dataframe. #I have adjusted them a bit to illustrate the (apparent) sigmoid relationship that #emerges after these adjustments. df <- data.frame( x = residuals - (actual * 2) + 1, y = (my_resid + 1) / 2, actual = actual ) #plot the relationship between the manually calculated residuals (with adjustment) #and the residuals straight from the model (with adjustment). #The curve is completely smooth, but I cannot find the function linking x to y ggplot(df) + geom_point(aes(x = x, y = y))

Plot outputted by ggplot

可以看出,这里似乎存在sigmoid关系。然而,我尝试使用

nls

 函数来获取连接 x 和 y 的最佳拟合 sigmoid 曲线的参数......但它根本不适合!以下是我尝试过的;我没有粘贴生成的图,但足以说明它没有显示直线(如果 x 和 y 之间的关系确实是 sigmoid,这就是我所期望的):

sigmoid <- function(x, L, k, x0) { L / (1 + exp(-k * (x - x0))) } model <- nls(y ~ sigmoid(x, L, k, x0), data = df, start = list(L = 1, k = 1, x0 = 1), control = nls.control(maxiter = 100)) df$fitted <- predict(model, df) ggplot(df) + geom_point(aes(x = fitted, y = y))
那么这里x和y之间的关系是什么?更重要的是,mlr3 逻辑回归模型的残差是如何在幕后计算的?

r logistic-regression mlr3
1个回答
0
投票
可能与

glm

 命令计算它们的方式相同:

glm1 <- glm(dependant~predictor, family="binomial", data=data) identical(residuals, glm1$residuals) # [1] TRUE
    
© www.soinside.com 2019 - 2024. All rights reserved.