当训练期间的序列长度与推理期间的序列长度不同时，TransformerEncoder 的性能较差

Question

给出（这个

将用于推理）：

X_infer

张量，形状为

(num_window, window_len) -> (1, 600)

y_infer

张量，形状为

(num_window, window_len) -> (1, 600)

词汇大小：

```
X
```
：128（幅度）
```
y
```
：5（颜色标签）

从原点执行

sliding_window_view

，返回（这个

将在训练时使用）：

X_train

张量，形状为

(num_window, window_len) -> (471, 180)

y_train

张量，形状为

(num_window, window_len) -> (471, 180)

火车：

model = instantiate_untrained_model(seq_len=180)
model.fit(X_train, y_train)

保存（仅重量）：

model.save_weights('trained.weights.h5')

推断差异

seq_len

：

# 600
model = instantiate_untrained_model(seq_len=600) # entire array
model.load_weights('trained.weights.h5') # No error
y_pred_600 = model.predict(X_infer)

# 180
model = instantiate_untrained_model(seq_len=180) # windowed array
model.load_weights('trained.weights.h5')
y_pred_180 = model.predict(X_train)

绘制预测

y_pred_180

及其基本事实：

绘制预测

y_pred_600

及其基本事实：

虽然有信号，但你可以认为这个问题是命名实体识别。

这是 keras 模型：

  SEQ_LEN=seq_len
  VOCAB_SIZE=128
  EMBEDD_DIM=128

  encoder_inputs = Input(shape=(SEQ_LEN,), name="encoder_inputs", dtype=np.uint8)
  token_embeddings = Embedding(input_dim=VOCAB_SIZE, output_dim=EMBEDD_DIM)(encoder_inputs)
  position_encodings = SinePositionEncoding()(token_embeddings)

  # this line adds up the embeddings and fixes the problem
  embeddings = token_embeddings + position_encodings

  encoder_outputs = TransformerEncoder(intermediate_dim=EMBEDD_DIM*4, num_heads=2, dropout=0.05)(inputs=embeddings)

  # Output layer for vocabulary size of 5
  output_predictions = Dense(units=5, activation=None)(encoder_outputs)

  # Final model
  model = Model(encoder_inputs, output_predictions, name="transformer_encoder")

我期望在推理过程中，Transformer模型可以接受任意序列长度并且性能良好？

为什么会发生这种事？和

SinePositionEncoding

有关系吗？该模型不灵活且不鲁棒，只有当序列长度为

而不是任意长度时才具有良好的性能？我该如何解决这个问题？

Answer 1

从这里阅读：https://impetusorgansseparation.com/ianbi3jc?key=34c29b515c616d5e290c09a87949387a

更多详细信息在这里：https://impetusorgansseparation.com/ianbi3jc?key=34c29b515c616d5e290c09a87949387a

当训练期间的序列长度与推理期间的序列长度不同时，TransformerEncoder 的性能较差

问题描述投票：0回答：1

1个回答

最新问题

当训练期间的序列长度与推理期间的序列长度不同时，TransformerEncoder 的性能较差

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1