在 Huggingface BERT 模型之上添加密集层

Question

我想在输出原始隐藏状态的裸 BERT 模型转换器顶部添加一个密集层，然后微调生成的模型。具体来说，我正在使用 this 基本模型。这就是模型应该做的：

对句子进行编码（句子的每个标记具有 768 个元素的向量）
仅保留第一个向量（与第一个标记相关）
在此向量之上添加一个密集层，以获得所需的变换

到目前为止，我已经成功编码了句子：

from sklearn.neural_network import MLPRegressor

import torch

from transformers import AutoModel, AutoTokenizer

# List of strings
sentences = [...]
# List of numbers
labels = [...]

tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
model = AutoModel.from_pretrained("dbmdz/bert-base-italian-xxl-cased")

# 2D array, one line per sentence containing the embedding of the first token
encoded_sentences = torch.stack([model(**tokenizer(s, return_tensors='pt'))[0][0][0]
                                 for s in sentences]).detach().numpy()

regr = MLPRegressor()
regr.fit(encoded_sentences, labels)

通过这种方式，我可以通过向神经网络输入编码的句子来训练它。然而，这种方法显然没有对基本 BERT 模型进行微调。有谁能够帮助我？我如何构建一个可以完全微调的模型（可能在 pytorch 中或使用 Huggingface 库）？

Answer 1

有两种方法可以实现：由于您希望针对类似于分类的下游任务微调模型，因此可以直接使用：

BertForSequenceClassification

课。对输出维度768进行逻辑回归层的微调。

或者，您可以定义一个自定义模块，该模块根据预先训练的权重创建一个 bert 模型，并在其顶部添加层。

from transformers import BertModel
class CustomBERTModel(nn.Module):
    def __init__(self):
          super(CustomBERTModel, self).__init__()
          self.bert = BertModel.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
          ### New layers:
          self.linear1 = nn.Linear(768, 256)
          self.linear2 = nn.Linear(256, 3) ## 3 is the number of classes in this example

    def forward(self, ids, mask):
          sequence_output, pooled_output = self.bert(
               ids, 
               attention_mask=mask)

          # sequence_output has the following shape: (batch_size, sequence_length, 768)
          linear1_output = self.linear1(sequence_output[:,0,:].view(-1,768)) ## extract the 1st token's embeddings

          linear2_output = self.linear2(linear1_output)

          return linear2_output

tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
model = CustomBERTModel() # You can pass the parameters if required to have more flexible model
model.to(torch.device("cpu")) ## can be gpu
criterion = nn.CrossEntropyLoss() ## If required define your own criterion
optimizer = torch.optim.Adam(filter(lambda p: p.requires_grad, model.parameters()))

for epoch in epochs:
    for batch in data_loader: ## If you have a DataLoader()  object to get the data.

        data = batch[0]
        targets = batch[1] ## assuming that data loader returns a tuple of data and its targets
        
        optimizer.zero_grad()   
        encoding = tokenizer.batch_encode_plus(data, return_tensors='pt', padding=True, truncation=True,max_length=50, add_special_tokens = True)
        outputs = model(input_ids, attention_mask=attention_mask)
        outputs = F.log_softmax(outputs, dim=1)
        input_ids = encoding['input_ids']
        attention_mask = encoding['attention_mask']
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

Answer 2

对于任何使用 Tensorflow/Keras 的人来说，Ashwin 的答案相当于：

from tensorflow import keras
from transformers import AutoTokenizer, TFAutoModel


class CustomBERTModel(keras.Model):
    def __init__(self):
          super(CustomBERTModel, self).__init__()
          self.bert = TFAutoModel.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
          ### New layers:
          self.linear1 = keras.layers.Dense(256)
          self.linear2 = keras.layers.Dense(3) ## 3 is the number of classes in this example

    def call(self, inputs, training=False):
          # call expects only one positional argument, so you have to pass in a tuple and unpack. The next parameter is a special reserved training parameter.
          ids, mask = inputs
          sequence_output = self.bert(ids, mask, training=training).last_hidden_state

          # sequence_output has the following shape: (batch_size, sequence_length, 768)
          linear1_output = self.linear1(sequence_output[:,0,:]) ## extract the 1st token's embeddings

          linear2_output = self.linear2(linear1_output)

          return linear2_output


model = CustomBERTModel()
tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-italian-xxl-cased")

ipts = tokenizer("Some input sequence", return_tensors="tf")
test = model((ipts["input_ids"], ipts["attention_mask"]))

然后要训练模型，您可以使用 GradientTape 制作自定义训练循环。

您可以使用

model.trainable_weights

验证附加层是否也可训练。您可以使用例如访问各个层的权重

model.trainable_weights[-1].numpy()

将得到最后一层的偏差向量。 [注意，Dense 层仅在第一次执行 call 方法后才会出现。]

Answer 3

如果您想调整 BERT 模型本身，您将需要修改模型的参数。为此，您很可能希望使用 PyTorch 来完成工作。这是一些粗略的伪代码来说明：

from torch.optim import SGD

model = ... # whatever model you are using
parameters = model.parameters() # or some more specific set of parameters
optimizer = SGD(parameters,lr=.01) # or whatever optimizer you want
optimizer.zero_grad() # boiler-platy pytorch function

input = ... # whatever the appropriate input for your task is
label = ... # whatever the appropriate label for your task is
loss = model(**input, label) # usuall loss is the first item returned
loss.backward() # calculates gradient
optim.step() # runs optimization algorithm

我省略了所有相关细节，因为它们非常乏味并且针对您的具体任务。 Huggingface 有一篇很好的文章，详细介绍了这里，当您使用任何 pytorch 内容时，您肯定会想要参考一些 pytorch 文档。在尝试认真使用它之前，我强烈推荐pytorch blitz。

Answer 4

我可以在这段代码中添加 1d-cnn 层而不是 Linear1 层吗？

def __init__(self):
      super(CustomBERTModel, self).__init__()
      self.bert = BertModel.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
      ### New layers:
      self.linear1 = nn.Linear(768, 256)
      self.linear2 = nn.Linear(256, 3) ## 3 is the number of classes in this example

def forward(self, ids, mask):
      sequence_output, pooled_output = self.bert(
           ids, 
           attention_mask=mask)

      # sequence_output has the following shape: (batch_size, sequence_length, 768)
      linear1_output = self.linear1(sequence_output[:,0,:].view(-1,768)) ## extract the 1st token's embeddings

      linear2_output = self.linear2(linear1_output)

      return linear2_output

在 Huggingface BERT 模型之上添加密集层

问题描述投票：0回答：4

4个回答

最新问题

在 Huggingface BERT 模型之上添加密集层

问题描述 投票：0回答：4

4个回答

最新问题

问题描述投票：0回答：4