首先是代码片段:
## Packages
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import fbeta_score
from imblearn.over_sampling import RandomOverSampler
from sklearn.datasets import make_classification
## Create dataset
class_weight = list([0.90])
X, Y = make_classification(n_samples = 1000, n_classes = 2, n_clusters_per_class = 2, weights = 0.9,
n_features = 10, n_informative = 10, class_sep = 1, shuffle = True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, stratify = Y)
## Define Function
def resampled_fit(X, y, model, model_params, sampling, sampling_params):
sampling = sampling(**sampling_params) ## <= here is the issue
X_train_balanced, Y_train_balanced = sampling.fit_resample(X, y)
# Fit the model to the balanced training data
model_obj = model(**model_params).fit(X_train_balanced, Y_train_balanced)
# Compute performance metrics
Y_pred = model_obj.predict(X_train_balanced)
f2_val = fbeta_score(Y_train_balanced, Y_pred, beta = 2)
return f2_val
## Define Inputs
ROS = RandomOverSampler(random_state = 42)
ROS_params = dict(sampling_strategy = 0.8)
SVM = SVC()
SVM_params = dict(kernel = 'rbf', probability = True)
output = resampled_fit(X, Y, SVM, SVM_params, ROS, ROS_params)
基本上,我想以与我分别输入分类器对象的参数相同的方式,在我的自定义函数中输入'sampling_strategy'作为单独的参数(对于RandomOverSampler)。但是,它不能像这样工作。
我收到错误消息:
TypeError: 'RandomOverSampler' object is not callable
我检查了RandomOverSampler函数的类型,但它与分类器对象的方式相同,为abc.ABCMeta
。覆盖函数中RandomOverSampler的输入参数的解决方法是什么?
PS:是的,我需要分别输入参数,因为我想在之后使用网格搜索来优化函数。显然,您需要执行CV才能使用平衡采样,但是正如已经提到的,这只是一个摘要。
感谢您对这个问题的任何帮助。