问题
目标是使用 GridSearchCV 调整参数,然后拟合两个模型,OneVsOneClassifier 和 OneVsRestClassifier 各一个。然后使用 GridSearchCV 中的调整参数检查每个模型的准确性性能。定义两个模型来拟合,以拟合数字识别 MNIST 数据集以进行多标签分类预测。
我使用 GridSearchCV 设置了一个调整超参数,对于 estimator__C' 来说很简单:[0.1, 1, 100, 200] 对于 LogisticRegression。为了审核,我打印了计算出的网格参数。为拟合模型提供缩放的 X-train 对象。然后运行拟合模型。
问题在 Kaggle GPU P100 上运行。当我执行代码:ovr_grid_search.fit() & ovo_grid_search.fit() 时,一切都在无限运行。
代码
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsOneClassifier, OneVsRestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train.astype(np.float64))
ovr_model = OneVsRestClassifier(LogisticRegression())
ovo_model = OneVsOneClassifier(LogisticRegression())
param_grid = {
'estimator__C': [0.1, 1, 100, 200]
}
ovr_grid_param = GridSearchCV(ovr_model, param_grid=param_grid, cv=5, n_jobs=-1)
ovo_grid_param = GridSearchCV(ovo_model, param_grid=param_grid, cv=5, n_jobs=-1)
print("OneVsRestClassifier best params: ", ovr_grid_param)
print("OneVsOneClassifier best params: ", ovo_grid_param)
### below code is the problem area
ovr_grid_search.fit(X_train_scaled, y_train)
ovo_grid_search.fit(X_train_scaled, y_train)
数据 数字识别 MNIST 数据集。
GridSearch 执行结果
OneVsRestClassifier best params: GridSearchCV(cv=5, error_score='raise',
estimator=OneVsRestClassifier(estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False),
n_jobs=1),
fit_params=None, iid=True, n_jobs=-1,
param_grid={'estimator__C': [0.1, 1, 100, 200]},
pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
scoring=None, verbose=0)
OneVsOneClassifier best params: GridSearchCV(cv=5, error_score='raise',
estimator=OneVsOneClassifier(estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False),
n_jobs=1),
fit_params=None, iid=True, n_jobs=-1,
param_grid={'estimator__C': [0.1, 1, 100, 200]},
pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
scoring=None, verbose=0)