我正在尝试创建一个PyTorch网络,如下图所示(关于arXiv论文,请参见:this link)。该网络旨在学习源代码的功能。基本上,它由嵌入查找层,卷积,最大池和密集层组成。
我建立这个网络的尝试看起来像这样:
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
"""
Class Net.
This network is used to learn source code features in a
supervised manner.
"""
def __init__(self, n_vocab, n=512, k=13, m=10):
"""
Constructor.
:parm n_vocab: size of the vocabulary
:param n: number of convolution filters
:param k: embedding size
:param m: kernel size
"""
super(Net, self).__init__()
# embedding layer
self.emb = nn.Embedding(n_vocab, k)
# convolution and pooling
self.conv1 = nn.Conv2d(1, n, (m, k))
self.pool = nn.AdaptiveMaxPool2d(1)
# fully connected layers
self.fc1 = nn.Linear(n, 100)
self.fc2 = nn.Linear(100, 5)
def forward(self, input):
"""
Performs a forward pass through the network.
:param input: input to network
:return: network output
"""
x = self.emb(torch.LongTensor(input))
x = x.view(1, 500, 13)
x = self.pool(F.relu(self.conv1(x)))
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
我无法使卷积工作。我不断收到错误:*** RuntimeError: Expected 4-dimensional input for 4-dimensional weight 512 1 10 13, but got 3-dimensional input of size [1, 500, 13] instead
。我网络的输入包含词汇索引,这些索引被馈送到嵌入层。输入示例如下所示:
[55, 28, 14, 56, 20, 55, 70, 14, 56, 20, 55, ..., 31, 31, 31, 31, 31, 31, 31]
将这个示例输入送入网络后,我得到了相应的嵌入:
ensor([[[-0.5966, -1.4197, 0.9875, ..., -0.0211, -2.3168, 0.3744],
[-0.1759, -1.1841, -0.0564, ..., -0.0804, -1.1820, -0.1344],
[ 1.4525, 0.1342, -0.3820, ..., -0.2679, 0.5997, 1.1058],
...,
[ 1.2074, 0.4087, -0.3353, ..., -0.1959, 0.5806, -1.4581],
[ 1.2074, 0.4087, -0.3353, ..., -0.1959, 0.5806, -1.4581],
[ 1.2074, 0.4087, -0.3353, ..., -0.1959, 0.5806, -1.4581]]],
grad_fn=<ViewBackward>)
输出看起来很适合我。显然,PyTorch的卷积需要4维,但是我只有3维。缺少的尺寸是什么?
我的火车功能看起来像这样:
def train(X, y, n_vocab, epochs=5):
"""
Trains the network.
:param X: network input (indices into vocabulary)
:param y: gold labels
:param epochs: number of epochs to train the network
(default = 5)
:return: trained network
"""
# instantiate network model
net = Net(n_vocab)
# define training loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
# train several epochs
for epoch in range(epochs):
running_loss = 0.0
for i in range(len(X)):
X_b, y_b = X[i], y[i]
# zero the parameter gradients
optimizer.zero_grad()
# perform forward pass
y_pred = net(X_b)
# compute loss
loss = criterion(y_pred, y_b)
# perform backpropagation
loss.backward()
# optimize model parameters
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 0:
print("[%d, %5d] loss: %.3f" %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print("Finished training")
return net
任何帮助将不胜感激!
谢谢你。
数据的第一维应该是批处理,在documentation中提到:
将2D卷积应用于由多个输入平面。
...具有输入大小
(N, C, H, W)
...的图层的输出值>]...
N是一个批次大小,C表示通道数,H是输入高度平面以像素为单位,W是宽度(以像素为单位)。
因此您应该在将数据传递到网络之前对其进行批处理,或者至少将其重塑为
(1, 1, 500, 13)
的形状以使用1的批处理大小。