R 中使用 loggerf 包的 Firth 模型非收敛错误

问题描述 投票:0回答:1

这里是示例数据的链接(示例数据不大 - 只有 23 kb,但可能会导致错误):

https://drive.google.com/file/d/1TWkFIKhq9VZkFnhUrt6LxYmab54ouODd/view?usp=sharing

这是我运行弗斯模型的代码。我在不同的运行中遇到了不同的错误(重新启动 r 或 r 会话),有时程序似乎被卡住了(但是,活动监视器显示 cpu 使用率 99%),其他时候我遇到了诸如不收敛之类的错误,并建议我增加迭代这并没有真正的帮助。

library(caret)
library(logistf)
library(data.table)


# Define training control
train_control <- trainControl(method = "repeatedcv", 
                              number = 3, repeats = 3,
                              savePredictions = TRUE,
                              classProbs = TRUE)

# Define the custom model function
firth_model <- list(
  type = "Classification",
  library = "logistf",
  loop = NULL,
  parameters = data.frame(parameter = c("none"), class = c("character"), label = c("none")),
  grid = function(x, y, len = NULL, search = "grid") {
    data.frame(none = "none")
  },
  fit = function(x, y, wts, param, lev, last, classProbs, ...) {
    data <- as.data.frame(x)
    data$group <- y
    logistf(group ~ ., data = data, control = logistf.control(maxit = 100), ...)
  },
  predict = function(modelFit, newdata, submodels = NULL) {
    as.factor(ifelse(predict(modelFit, newdata, type = "response") > 0.5, "AD", "control"))
  },
  prob = function(modelFit, newdata, submodels = NULL) {
    preds <- predict(modelFit, newdata, type = "response")
    data.frame(control = 1 - preds, AD = preds)
  }
)


train_proc <- fread("train_proc.csv")

# Training the model
set.seed(123)
firth.logist.model <- train(train_proc[, .SD, .SDcols = !c("group")],
                            train_proc$group,
                            method = firth_model,
                            trControl = train_control)

print(firth.logist.model)

这是最近的错误

Warning in logistf(group ~ ., data = data, control = logistf.control(maxit = 100),  :
  Nonconverged PL confidence limits: maximum number of iterations for variables: (Intercept), x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13, x14, x15, x16, x17, x18, x19, x20, x21, x22, x23, x24 exceeded. Try to increase the number of iterations by passing 'logistpl.control(maxit=...)' to parameter plcontrol

相同的代码似乎可以在某些数据集上运行,但不能在其他数据集上运行。但这也可能是由于我的函数无法针对特定数据集进行自定义。我遇到了许多不同类型的错误,我开始怀疑

logistf
包本身是否不稳定。

为了提供更多信息,这是我的 r 版本:

R.version
               _                           
platform       aarch64-apple-darwin20      
arch           aarch64                     
os             darwin20                    
system         aarch64, darwin20           
status                                     
major          4                           
minor          3.2                         
year           2023                        
month          10                          
day            31                          
svn rev        85441                       
language       R                           
version.string R version 4.3.2 (2023-10-31)
nickname       Eye Holes   

这是我的软件包版本:

> packageVersion("caret")
[1] ‘6.0.94’
> packageVersion("logistf")
[1] ‘1.26.0’
> packageVersion("data.table")
[1] ‘1.14.10’
r machine-learning logistic-regression sampling convergence
1个回答
0
投票

如果按照警告消息的建议增加迭代次数,您应该会得到所需的结果。

train_proc <- data.table::fread("train_proc.csv")
names(train_proc)[25] <- "group"  # missing from your code

set.seed(123)
firth.logist.model <- train(train_proc[, .SD, .SDcols = !"group"],
                            train_proc$group,
                            method = firth_model,
                            trControl = train_control,
                            plcontrol=logistpl.control(maxit=1000))

print(firth.logist.model)

53 samples
24 predictors
 2 classes: 'AD', 'control' 

No pre-processing
Resampling: Cross-Validated (3 fold, repeated 3 times) 
Summary of sample sizes: 35, 36, 35, 36, 35, 35, ... 
Resampling results:

  Accuracy   Kappa     
  0.4270153  -0.1482292

Tuning parameter 'none' was held constant at a value of none
© www.soinside.com 2019 - 2024. All rights reserved.