设置 SciKeras 模型的问题

问题描述 投票:0回答:2

我有一个使用 scikit-learn 的现有设置,但正在考虑使用 Keras 扩展到深度学习。我也在使用 Dask,推荐使用 SciKeras

SciKeras

KerasClassifier
当前的设置方式似乎符合预期(从详细输出来看),但模型似乎根本没有学到任何东西。我已经遵循了这里的SciKeras 文档,但我可能忽略了一些东西。

使用 Scikit-Learn RF 分类器,kappa 分数约为 0.44, Keras 约为 0.55,而 SciKeras 为 0.0(显然是 问题)。在

2. Following SciKeras docs to use Keras
的地方 是与以下内容相比阻止类似结果的实现错误 使用下面的
3. Exclusively using Keras
实现的那个?

下面我列出了当前使用 RF 的 scikit-learn 实现(作为预期输出)、使用 SciKeras 的输出(作为实际输出)以及仅使用 Keras 的输出(作为预期结果)

1。使用 scikit-learn 随机森林的当前输出:

def default_classifier():
    return RandomForestClassifier(oob_score=True, n_jobs=-1)

... ### Preprocessing stuff...

X_train, X_test, y_train, y_test = splits

# Define the Pipeline    
## Classification    
model = default_classifier()
model.fit(X_train, y_train)

## Evaluation Metrics
from sklearn.model_selection import cross_val_score
score = cross_val_score(model, X_test, y_test, scoring='accuracy', cv=5, n_jobs=-1, error_score='raise')
print('Mean: %.3f (Std: %.3f)' % (np.mean(score), np.std(score)))

# Verbose with results...
columns, report, true_matrix, pred_matrix = cl.classification_metrics(model, splits, score)

各自的sklearn输出:

Test Size:  0.2
Split Shapes:   [(79997, 96), (20000, 96), (79997, 12), (20000, 12)]
Mean: 0.374 (Std: 0.006)
Overall: 0.510  Kappa: 0.441
Weighted F1-Score: 0.539

2。按照 SciKeras 文档使用 Keras:

from tensorflow import keras
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import train_test_split
import numpy as np

def fcn_model(hidden_layer_dim, meta):
    # note that meta is a special argument that will be
    # handed a dict containing input metadata
    n_features_in_ = meta["n_features_in_"]
    X_shape_ = meta["X_shape_"]
    n_classes_ = meta["n_classes_"]
    
    model = keras.models.Sequential()
    model.add(keras.layers.Dense(n_features_in_, input_shape=X_shape_[1:]))
    model.add(keras.layers.Activation("relu"))
    model.add(keras.layers.Dense(hidden_layer_dim))
    model.add(keras.layers.Activation("relu"))
    model.add(keras.layers.Dense(n_classes_))
    model.add(keras.layers.Activation("softmax"))
    return model

def get_model_fcn(modelargs={}):
    return KerasClassifier(fcn_model, 
                           hidden_layer_dim=128, 
                           epochs=10,
                           optimizer='adam',
                           loss='categorical_crossentropy',
                           metrics=['accuracy'],
                           fit__use_multiprocessing=True,
                           **modelargs)

... ### Preprocessing stuff...

X_train, X_test, y_train, y_test = splits

# Define the Pipeline    
## Classification    
model = get_model_fcn()
model.fit(X_train, y_train)

## Evaluation Metrics
from sklearn.model_selection import cross_val_score
score = cross_val_score(model, X_test, y_test, scoring='accuracy', cv=5, n_jobs=-1, error_score='raise')
print('Mean: %.3f (Std: %.3f)' % (np.mean(score), np.std(score)))

columns, report, true_matrix, pred_matrix = cl.classification_metrics(model, splits, score)

各自的scikeras输出(结果不太好):

Test Size:  0.2
Split Shapes:   [(79997, 96), (20000, 96), (79997, 12), (20000, 12)]
Epoch 1/10
2500/2500 [==============================] - 4s 1ms/step - loss: 1.6750 - accuracy: 0.3762
Epoch 2/10
2500/2500 [==============================] - 3s 1ms/step - loss: 1.3132 - accuracy: 0.5021
Epoch 3/10
2500/2500 [==============================] - 3s 1ms/step - loss: 1.2295 - accuracy: 0.5371
Epoch 4/10
2500/2500 [==============================] - 3s 1ms/step - loss: 1.1651 - accuracy: 0.5599
Epoch 5/10
2500/2500 [==============================] - 3s 1ms/step - loss: 1.1178 - accuracy: 0.5806
Epoch 6/10
2500/2500 [==============================] - 3s 1ms/step - loss: 1.0889 - accuracy: 0.5935
Epoch 7/10
2500/2500 [==============================] - 3s 1ms/step - loss: 1.0845 - accuracy: 0.5922
Epoch 8/10
2500/2500 [==============================] - 3s 1ms/step - loss: 1.0548 - accuracy: 0.6043
Epoch 9/10
2500/2500 [==============================] - 3s 1ms/step - loss: 1.0415 - accuracy: 0.6117
Epoch 10/10
2500/2500 [==============================] - 3s 1ms/step - loss: 1.0316 - accuracy: 0.6172
Mean: 0.000 (Std: 0.000)
625/625 [==============================] - 0s 700us/step # Here it is running model.predict(X_test)
Overall: 0.130  Kappa: 0.000
Weighted F1-Score: 0.030

3.独家使用 Keras:

# meta copies what SciKeras passes to the Keras model
meta = {
    #'classes_': array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11]), 
    #'target_type_': 'multilabel-indicator', 
    'y_dtype_': np.dtype('uint8'), 
    'y_ndim_': 2, 
    'X_dtype_': np.dtype('float32'), 
    'X_shape_': (79997, 96), 
    'n_features_in_': 96, 
    #'target_encoder_': ClassifierLabelEncoder(loss='categorical_crossentropy'), 
    'n_classes_': 12, 
    'n_outputs_': 1, 
    'n_outputs_expected_': 1, 
    #'feature_encoder_': FunctionTransformer()
}

def fcn_model(hidden_layer_dim, meta):
    # note that meta is a special argument that will be
    # handed a dict containing input metadata
    n_features_in_ = meta["n_features_in_"]
    X_shape_ = meta["X_shape_"]
    n_classes_ = meta["n_classes_"]
    
    model = keras.models.Sequential()
    model.add(keras.layers.Dense(n_features_in_, input_shape=X_shape_[1:]))
    model.add(keras.layers.Activation("relu"))
    model.add(keras.layers.Dense(hidden_layer_dim))
    model.add(keras.layers.Activation("relu"))
    model.add(keras.layers.Dense(n_classes_))
    model.add(keras.layers.Activation("softmax"))
    return model

def get_model_fcn(modelargs={}):
    model = fcn_model(128, meta)
    model.compile(optimizer='adam', 
              loss='categorical_crossentropy', 
              metrics=['accuracy'])
    
    return model

... ### Preprocessing stuff...

X_train, X_test, y_train, y_test = splits

# Define the Pipeline    
## Classification    
model = get_model_fcn()
model.fit(X_train, y_train, epochs=10)

## Evaluation Metrics
#from sklearn.model_selection import cross_val_score
#score = cross_val_score(model, X_test, y_test, scoring='accuracy', cv=5, n_jobs=-1, #error_score='raise')
#print('Mean: %.3f (Std: %.3f)' % (np.mean(score), np.std(score)))

columns, report, true_matrix, pred_matrix = cl.classification_metrics(model, splits, score)

使用 Keras 的预期输出:

Test Size:  0.2
Split Shapes:   [(79997, 96), (20000, 96), (79997, 12), (20000, 12)]
Epoch 1/10
2500/2500 [==============================] - 3s 1ms/step - loss: 1.6941 - accuracy: 0.3730
Epoch 2/10
2500/2500 [==============================] - 3s 1ms/step - loss: 1.3193 - accuracy: 0.5002
Epoch 3/10
2500/2500 [==============================] - 3s 1ms/step - loss: 1.2206 - accuracy: 0.5399
Epoch 4/10
2500/2500 [==============================] - 3s 1ms/step - loss: 1.1585 - accuracy: 0.5613
Epoch 5/10
2500/2500 [==============================] - 3s 1ms/step - loss: 1.1221 - accuracy: 0.5758
Epoch 6/10
2500/2500 [==============================] - 3s 1ms/step - loss: 1.0923 - accuracy: 0.5928
Epoch 7/10
2500/2500 [==============================] - 3s 1ms/step - loss: 1.0682 - accuracy: 0.5984
Epoch 8/10
2500/2500 [==============================] - 3s 1ms/step - loss: 1.0611 - accuracy: 0.6046
Epoch 9/10
2500/2500 [==============================] - 3s 1ms/step - loss: 1.0445 - accuracy: 0.6138
Epoch 10/10
2500/2500 [==============================] - 3s 1ms/step - loss: 1.0236 - accuracy: 0.6186
Overall: 0.601  Kappa: 0.548
Weighted F1-Score: 0.600
python machine-learning keras scikit-learn deep-learning
2个回答
2
投票

显然这是它如何处理多类one-hot编码目标的一个错误,问题在这里处理


0
投票

有趣的辩论.

我正在练习一下,遇到了一些问题。我的代码是:

https://gist.githubusercontent.com/robintux/c4baabf031f938c416e24c31357998e1/raw/affdb0881f1850a980e423e6e21b4312abebed6e/create_model.py

我正在寻找执行 gridsearchcv :

https://gist.githubusercontent.com/robintux/c4baabf031f938c416e24c31357998e1/raw/affdb0881f1850a980e423e6e21b4312abebed6e/GridSearch.py

但我有一个例外:

https://gist.githubusercontent.com/robintux/c4baabf031f938c416e24c31357998e1/raw/affdb0881f1850a980e423e6e21b4312abebed6e/Errores

我认为该异常是由我定义函数create_model()的方式引起的,但我找不到解决方案,请问您有什么建议吗?

© www.soinside.com 2019 - 2024. All rights reserved.