这里是示例数据的链接(示例数据不大 - 只有 23 kb,但可能会导致错误):
https://drive.google.com/file/d/1TWkFIKhq9VZkFnhUrt6LxYmab54ouODd/view?usp=sharing
这是我运行弗斯模型的代码。我在不同的运行中遇到了不同的错误(重新启动 r 或 r 会话),有时程序似乎被卡住了(但是,活动监视器显示 cpu 使用率 99%),其他时候我遇到了诸如不收敛之类的错误,并建议我增加迭代这并没有真正的帮助。
library(caret)
library(logistf)
library(data.table)
# Define training control
train_control <- trainControl(method = "repeatedcv",
number = 3, repeats = 3,
savePredictions = TRUE,
classProbs = TRUE)
# Define the custom model function
firth_model <- list(
type = "Classification",
library = "logistf",
loop = NULL,
parameters = data.frame(parameter = c("none"), class = c("character"), label = c("none")),
grid = function(x, y, len = NULL, search = "grid") {
data.frame(none = "none")
},
fit = function(x, y, wts, param, lev, last, classProbs, ...) {
data <- as.data.frame(x)
data$group <- y
logistf(group ~ ., data = data, control = logistf.control(maxit = 100), ...)
},
predict = function(modelFit, newdata, submodels = NULL) {
as.factor(ifelse(predict(modelFit, newdata, type = "response") > 0.5, "AD", "control"))
},
prob = function(modelFit, newdata, submodels = NULL) {
preds <- predict(modelFit, newdata, type = "response")
data.frame(control = 1 - preds, AD = preds)
}
)
train_proc <- fread("train_proc.csv")
# Training the model
set.seed(123)
firth.logist.model <- train(train_proc[, .SD, .SDcols = !c("group")],
train_proc$group,
method = firth_model,
trControl = train_control)
print(firth.logist.model)
这是最近的错误
Warning in logistf(group ~ ., data = data, control = logistf.control(maxit = 100), :
Nonconverged PL confidence limits: maximum number of iterations for variables: (Intercept), x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13, x14, x15, x16, x17, x18, x19, x20, x21, x22, x23, x24 exceeded. Try to increase the number of iterations by passing 'logistpl.control(maxit=...)' to parameter plcontrol
相同的代码似乎可以在某些数据集上运行,但不能在其他数据集上运行。但这也可能是由于我的函数无法针对特定数据集进行自定义。我遇到了许多不同类型的错误,我开始怀疑
logistf
包本身是否不稳定。
为了提供更多信息,这是我的 r 版本:
R.version
_
platform aarch64-apple-darwin20
arch aarch64
os darwin20
system aarch64, darwin20
status
major 4
minor 3.2
year 2023
month 10
day 31
svn rev 85441
language R
version.string R version 4.3.2 (2023-10-31)
nickname Eye Holes
这是我的软件包版本:
> packageVersion("caret")
[1] ‘6.0.94’
> packageVersion("logistf")
[1] ‘1.26.0’
> packageVersion("data.table")
[1] ‘1.14.10’
如果按照警告消息的建议增加迭代次数,您应该会得到所需的结果。
train_proc <- data.table::fread("train_proc.csv")
names(train_proc)[25] <- "group" # missing from your code
set.seed(123)
firth.logist.model <- train(train_proc[, .SD, .SDcols = !"group"],
train_proc$group,
method = firth_model,
trControl = train_control,
plcontrol=logistpl.control(maxit=1000))
print(firth.logist.model)
53 samples
24 predictors
2 classes: 'AD', 'control'
No pre-processing
Resampling: Cross-Validated (3 fold, repeated 3 times)
Summary of sample sizes: 35, 36, 35, 36, 35, 35, ...
Resampling results:
Accuracy Kappa
0.4270153 -0.1482292
Tuning parameter 'none' was held constant at a value of none