我想绘制我的 RSF 模型的 SHAP 值;这是代码和错误:
xvars <- c("RIDRETH1", "RXDLIPID", "DRXTKCAL", "DRXTPROT", "DRXTCARB", "DRXTCHOL", "DRXTFIBE", "DRXTVARA", "DRXTATOC", "DRXTSODI", "DRXTPOTA", "DRXTM161", "DRXTM181", "DRXTM201", "DRXTM221", "DRXTP182", "DRXTP183", "DRXTP184", "DRXTP204", "DRXTP205", "DRXTP225", "DRXTP226", "DRXTRET", "DRXT_G_TOTAL", "DRXT_V_STARCHY_TOTAL", "DRXTS160", "DRXTS180", "DRXTsumSFA", "INDFMPIR", "LBXCOT", "GENDERRC")
X <- Data[sample(nrow(Data), 1000), xvars]
bg_X <- Data[sample(nrow(Data), 200), ]
system.time(
ks <- kernelshap(rf_mort_nutrients_withoutage_1018_all, X, bg_X = bg_X, type = 'prob')
)
ks
ks <- shapviz(ks)
sv_importance(ks, kind = "bee", )
错误: Fejl我align_pred(pred_fun(对象,bg_X,...)): 预测必须是数字! 计时停止于:0.03 0.05 0.11
这些是我的预测:
rf_mort_nutrients_withoutage_1018_all$predicted
[1] 81.31376 75.82491 99.35944 58.63055 67.65847 98.32906 75.33934 107.81604 62.22175 75.69875 69.99881 83.67161 81.39735 65.59381
我不确定为什么它不起作用。有人有想法吗?
要分析连续排名概率分数,您可以像这样工作:
library(randomForestSRC)
library(survival)
library(kernelshap)
library(shapviz)
head(veteran)
# trt celltype time status karno diagtime age prior
# 1 1 squamous 72 1 60 7 69 0
# 2 1 squamous 411 1 70 5 64 10
# 3 1 squamous 228 1 60 3 38 0
xvars <- setdiff(colnames(veteran), c("time", "status"))
fit <- rfsrc(
reformulate(xvars, "Surv(time, status)"),
data = veteran,
ntree = 50,
nodesize = 20,
importance = TRUE
)
# Function that returns continuous rank probability scores
pred_fun <- function(model, data) {
predict(model, data)$predicted
}
# Sample <=1000 rows from the training data. veteran is small enough to use all
X_explain <- veteran[xvars]
sv <- kernelshap(fit, X = X_explain, pred_fun = pred_fun) |>
shapviz()
sv |> sv_importance(kind = "bee")
sv |> sv_dependence(xvars)