如何使用 Keras 对 CNN 模型中的多个输入数据进行交叉验证

问题描述 投票:0回答:1

我的数据集由时间序列(10080)和其他描述性统计特征(85)组成连接成一行。数据框是

921 x 10166

数据看起来像这样,最后 2 列为

Y
(标签)。

id    x0  x1    x2   x3   x4   x5  ... x10079   mean var ... Y0     Y1
1    40  31.05 25.5 25.5 25.5 25   ...  33       24   1       1      0
2    35  35.75 36.5 26.5 36.5 36.5 ...  29       31   2       0      1 
3    35  35.70 36.5 36.5 36.5 36.5 ...  29       25   1       1      0 
4    40  31.50 23.5 24.5 26.5 25   ...  33       29   3       0      1
 ... 
921  40  31.05 25.5 25.5 25.5 25   ...  23       33   2       0      1

我检查了一些博客教程,它们很有帮助,但我不确定如何处理我的输入数据,我将其分为

inputs_1
inputs_2
,如下面的模型所示:

inputs_1 = keras.Input(shape=(10081,1))

layer1 = Conv1D(64,14)(inputs_1)
layer2 = layers.MaxPool1D(5)(layer1)
layer3 = Conv1D(64, 14)(layer2)
layer4 = layers.GlobalMaxPooling1D()(layer3)

inputs_2 = keras.Input(shape=(85,))            
layer5 = layers.concatenate([layer4, inputs_2])
layer6 = Dense(128, activation='relu')(layer5)
layer7 = Dense(2, activation='softmax')(layer6)

model_2 = keras.models.Model(inputs = [inputs_1, inputs_2], output = [layer7])

X_train, X_test, y_train, y_test = train_test_split(df.iloc[:,0:10166], merge[['Result_cat','Result_cat1']].values, test_size=0.2) 
X_train = X_train.to_numpy()
X_train = X_train.reshape([X_train.shape[0], X_train.shape[1], 1]) 
X_train_1 = X_train[:,0:10081,:]
X_train_2 = X_train[:,10081:10166,:].reshape(736,85)  

X_test = X_test.to_numpy()
X_test = X_test.reshape([X_test.shape[0], X_test.shape[1], 1]) 
X_test_1 = X_test[:,0:10081,:]
X_test_2 = X_test[:,10081:10166,:].reshape(185,85)    

adam = keras.optimizers.Adam(lr = 0.0005)
model_2.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['acc'])
history = model_2.fit([X_train_1,X_train_2], y_train, epochs = 120, batch_size = 256, validation_split = 0.2, callbacks = [keras.callbacks.EarlyStopping(monitor='val_loss', patience=20)])

将特征分为两部分的原因是,

inputs_1
主要是时间序列数据,而
inputs_2
是描述性统计数据。我认为考虑到数据的不同性质,最好将它们分开。如果我错了请纠正我。

我的问题是,由于我的特征数据在原始模型中被划分和单独处理,我是否应该在交叉验证中做同样的事情(分别处理

inputs_1
inputs_2
)?特别是,例如,在 Jason 的模型中:

# MLP for Pima Indians Dataset with 10-fold cross validation
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import StratifiedKFold
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# define 10-fold cross validation test harness
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
cvscores = []
for train, test in kfold.split(X, Y):
  # create model
    model = Sequential()
    model.add(Dense(12, input_dim=8, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    # Fit the model
    model.fit(X[train], Y[train], epochs=150, batch_size=10, verbose=0)
    # evaluate the model
    scores = model.evaluate(X[test], Y[test], verbose=0)
    print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
    cvscores.append(scores[1] * 100)
print("%.2f%% (+/- %.2f%%)" % (numpy.mean(cvscores), numpy.std(cvscores)))

评估是使用代码

scores = model.evaluate(X[test], Y[test], verbose=0)
完成的,其中使用了
X[test], Y[test]
。就我而言,由于我有
inputs_1
inputs_2
而不是
X
(在示例模型中),我应该使用类似
[inputs_1,inputs_2][test]
的东西吗?

如有任何建议,我们将不胜感激。谢谢


更新:

我尝试将

inputs_1
inputs_2

连接起来
con_x = np.concatenate((X_train_1,X_train_2), axis = 1)

并将模型的第一行更改为

for train, test in kfold.split(con_x, Y):

但它又回来了

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-d53a7058d157> in <module>()
     55 cvscores = []
---> 56 for train, test in kfold.split(con_x, Y):
     57 
     58     inputs_1 = keras.Input(shape=(10080,1))

1 frames
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    537         if not allow_nd and array.ndim >= 3:
    538             raise ValueError("Found array with dim %d. %s expected <= 2."
--> 539                              % (array.ndim, estimator_name))
    540         if force_all_finite:
    541             _assert_all_finite(array,

ValueError: Found array with dim 3. Estimator expected <= 2.

但是,我仍然不确定像这样连接

inputs_1
inputs_2
是否有效。

python machine-learning keras conv-neural-network cross-validation
1个回答
0
投票

这可能已经过时,但不建议将 keras 顺序 API 用于多输入模型。您需要子类化 keras.Model 或使用功能 API 构建模型 keras 顺序 API 避免场景

© www.soinside.com 2019 - 2024. All rights reserved.