我正在使用 tidymodels 和工作流集比较数据集上的一些 ML 模型,并且我想同时将它们与领域中常用的启发式规则进行比较。我认为指定任一规则可能很简单,例如y_pred = (x1 > 3)|(x2 <1) as a model on the same data, tune nothing (as it won't change) and then compare easily to all the other models as it's just a poorly fit model, using yardstick etc. I cannot for the life of me figure out what is the right way to specify it cleanly at the start, the same as the models that actually get fit.
社区贡献的防风草扩展包bespoke允许人们定义这些类型的模型。安装:
pak::pak("macmillancontentscience/bespoke")
主函数
bespoke()
将数据帧作为输入并返回一个向量(整数、字符或因子),指示结果作为输出(每个输入行一个值)。一个简单的例子来说明它的实际效果:
library(parsnip)
library(bespoke)
dat <- data.frame(
y = factor(sample(c("a", "b"), 10, replace = TRUE)),
x1 = rnorm(10),
x2 = rnorm(10, .5)
)
make_pred <- function(x) {
y_pred <- x$x1 > x$x2
factor(y_pred, labels = c("a", "b"))
}
model_spec <- bespoke(fn = make_pred)
model_spec
#> bespoke Model Specification (classification)
#>
#> Main Arguments:
#> fn = make_pred
#>
#> Computational engine: bespoke
model_fit <- model_spec %>% fit(y ~ x1 + x2, dat)
predict(model_fit, dat)
#> # A tibble: 10 × 1
#> .pred_class
#> <fct>
#> 1 b
#> 2 b
#> 3 b
#> 4 a
#> 5 a
#> 6 b
#> 7 a
#> 8 b
#> 9 a
#> 10 b
创建于 2024-03-20,使用 reprex v2.1.0
:)