使用pandas数据帧设置Keras模型

Question

这是我第一次使用python和Keras进行机器学习我习惯使用MATLAB。基本上我有一个镶木地板，其中标签为一列，文本为另一列。我使用GloVe嵌入文本并对其进行矢量化，所以在所有这些之后我留下了2列：vectorized，它在每个numpy数组中有一个带有4000个数字的ndarray;和标签栏。然后我尝试使用这个矢量化列作为我的模型的输入，但这是我遇到问题的地方。

pd_df.head(1) #pd_df is my dataframe

输出：

    vectorized  label
0   [-0.10767000168561935, 0.11052999645471573, 0....   0

然后我拆分我的数据并转换为ndarrays：

from sklearn.model_selection import train_test_split

train, test = train_test_split(pd_df, test_size=0.3)

trainLabels = train.as_matrix(columns=['label'])
train = train.as_matrix(columns=['vectorized'])

testLabels = test.as_matrix(columns=['label'])
test = test.as_matrix(columns=['vectorized'])

然后我检查我的数据的形状：

train.shape
(410750, 1)

这是我缺乏关于numpy的知识的地方，因为这个大小对我来说没有意义。它似乎应该是（410750,4000）因为每个元素是4000个项目的ndarray。

在此之后我设置了我的模型：

from keras.layers import Input, Dense
from keras.models import Model
from keras.optimizers import SGD
from keras.losses import binary_crossentropy
from keras.metrics import binary_accuracy

inputs = Input(shape=(4000,))

x = Dense(units=2000, activation='relu')(inputs)
x = Dense(units=500, activation='relu')(x)
output = Dense(units=2, activation='softmax')(x)

model = Model(inputs=inputs, outputs=output)
model.compile(optimizer=SGD(), loss=binary_crossentropy, metrics=['accuracy'])
model.fit(train, 
          trainLabels, 
          epochs=50,
          batch_size=50)

然后我不断收到错误：

ValueError: Error when checking input: expected input_13 to have shape (4000,) but got array with shape (1,)

就像我说我是陌生世界机器学习的新手，所以任何帮助都会很棒。

感谢您的任何帮助。

Answer 1

您的训练数据只有一个维度，而您在输入中指定了4000个维度。此外，如果使用预训练的单词嵌入（如GloVe），则应使用嵌入层。看看这个Keras博客：https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html

Answer 2

为了解决这个问题，我必须解压缩我的数组数组。我选择这样做的方式是：

xTrain = np.zeros((train.shape[0], 4000))

i = 0
for vector in train: # train is my numpy array of arrays
    xTrain[i] = vector[0]
    i += 1

使用pandas数据帧设置Keras模型

问题描述投票：0回答：2

2个回答

最新问题

使用pandas数据帧设置Keras模型

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2