Keras-具有经常掉线的GRU层-损耗:'nan',精度:0

问题描述 投票:3回答:1

问题描述

我正在研究FrançoisChollet(publisher webpagenotebooks on github)的“ Python深度学习”。复制第6章中的示例时,我遇到了GRU层(经常丢失)的问题(我相信)。

我最初观察到这些错误的代码很长,所以我决定坚持最简单的问题,该问题可以复制该错误:将IMDB评论分为“正”和“负”类别。

[当我使用具有经常性辍学的GRU层时,训练损失(在几个批次的第一个时期之后)取nan的“值”,而训练精度(从第二个时期开始)取值为0。

   64/12000 [..............................] - ETA: 3:05 - loss: 0.6930 - accuracy: 0.4844
  128/12000 [..............................] - ETA: 2:09 - loss: 0.6926 - accuracy: 0.4766
  192/12000 [..............................] - ETA: 1:50 - loss: 0.6910 - accuracy: 0.5573
(...) 
 3136/12000 [======>.......................] - ETA: 59s - loss: 0.6870 - accuracy: 0.5635
 3200/12000 [=======>......................] - ETA: 58s - loss: 0.6862 - accuracy: 0.5650
 3264/12000 [=======>......................] - ETA: 58s - loss: 0.6860 - accuracy: 0.5650
 3328/12000 [=======>......................] - ETA: 57s - loss: nan - accuracy: 0.5667   
 3392/12000 [=======>......................] - ETA: 57s - loss: nan - accuracy: 0.5560
 3456/12000 [=======>......................] - ETA: 56s - loss: nan - accuracy: 0.5457
(...)
11840/12000 [============================>.] - ETA: 1s - loss: nan - accuracy: 0.1593
11904/12000 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.1584
11968/12000 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.1576
12000/12000 [==============================] - 83s 7ms/step - loss: nan - accuracy: 0.1572 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 2/20

   64/12000 [..............................] - ETA: 1:16 - loss: nan - accuracy: 0.0000e+00
  128/12000 [..............................] - ETA: 1:15 - loss: nan - accuracy: 0.0000e+00
  192/12000 [..............................] - ETA: 1:16 - loss: nan - accuracy: 0.0000e+00
(...)
11840/12000 [============================>.] - ETA: 1s - loss: nan - accuracy: 0.0000e+00
11904/12000 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
11968/12000 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
12000/12000 [==============================] - 82s 7ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 3/20

   64/12000 [..............................] - ETA: 1:18 - loss: nan - accuracy: 0.0000e+00
  128/12000 [..............................] - ETA: 1:18 - loss: nan - accuracy: 0.0000e+00
  192/12000 [..............................] - ETA: 1:16 - loss: nan - accuracy: 0.0000e+00
(...)

本地化问题

为了找到解决方案,我编写了下面提供的代码,该代码经过几种模型(GRU / LSTM,{无辍学,仅“正常”辍学,仅经常性辍学,“正常”和经常性辍学,rmsprop / adam})并提出所有这些模型的损失和准确性。 (它还会为每个模型创建较小的单独图形。)

# Based on examples from "Deep Learning with Python" by François Chollet:
## Constants, modules:
VERSION = 2

import os
from keras import models
from keras import layers
import matplotlib.pyplot as plt
import pylab

## Loading data:
from keras.datasets import imdb
(x_train, y_train), (x_test, y_test) = \
    imdb.load_data(num_words=10000)

from keras.preprocessing import sequence
x_train = sequence.pad_sequences(x_train, maxlen=500)
x_test = sequence.pad_sequences(x_test, maxlen=500)


## Dictionary with models' hyperparameters:
MODELS = [
    # GRU:
    {"no": 1,
     "layer_type": "GRU",
     "optimizer": "rmsprop",
     "dropout": None,
     "recurrent_dropout": None},

    {"no": 2,
     "layer_type": "GRU",
     "optimizer": "rmsprop",
     "dropout": 0.3,
     "recurrent_dropout": None},

    {"no": 3,
     "layer_type": "GRU",
     "optimizer": "rmsprop",
     "dropout": None,
     "recurrent_dropout": 0.3},

    {"no": 4,
     "layer_type": "GRU",
     "optimizer": "rmsprop",
     "dropout": 0.3,
     "recurrent_dropout": 0.3},

    {"no": 5,
     "layer_type": "GRU",
     "optimizer": "adam",
     "dropout": None,
     "recurrent_dropout": None},

    {"no": 6,
     "layer_type": "GRU",
     "optimizer": "adam",
     "dropout": 0.3,
     "recurrent_dropout": None},

    {"no": 7,
     "layer_type": "GRU",
     "optimizer": "adam",
     "dropout": None,
     "recurrent_dropout": 0.3},

    {"no": 8,
     "layer_type": "GRU",
     "optimizer": "adam",
     "dropout": 0.3,
     "recurrent_dropout": 0.3},

    # LSTM:
    {"no": 9,
     "layer_type": "LSTM",
     "optimizer": "rmsprop",
     "dropout": None,
     "recurrent_dropout": None},

    {"no": 10,
     "layer_type": "LSTM",
     "optimizer": "rmsprop",
     "dropout": 0.3,
     "recurrent_dropout": None},

    {"no": 11,
     "layer_type": "LSTM",
     "optimizer": "rmsprop",
     "dropout": None,
     "recurrent_dropout": 0.3},

    {"no": 12,
     "layer_type": "LSTM",
     "optimizer": "rmsprop",
     "dropout": 0.3,
     "recurrent_dropout": 0.3},

    {"no": 13,
     "layer_type": "LSTM",
     "optimizer": "adam",
     "dropout": None,
     "recurrent_dropout": None},

    {"no": 14,
     "layer_type": "LSTM",
     "optimizer": "adam",
     "dropout": 0.3,
     "recurrent_dropout": None},

    {"no": 15,
     "layer_type": "LSTM",
     "optimizer": "adam",
     "dropout": None,
     "recurrent_dropout": 0.3},

    {"no": 16,
     "layer_type": "LSTM",
     "optimizer": "adam",
     "dropout": 0.3,
     "recurrent_dropout": 0.3},
]

## Adding name:
for model_dict in MODELS:
    model_dict["name"] = f"{model_dict['layer_type']}"
    model_dict["name"] += f"_d{model_dict['dropout']}" if model_dict['dropout'] is not None else f"_dN"
    model_dict["name"] += f"_rd{model_dict['recurrent_dropout']}" if model_dict['recurrent_dropout'] is not None else f"_rdN"
    model_dict["name"] += f"_{model_dict['optimizer']}"

## Fucntion - defing and training model:
def train_model(model_dict):
    """Defines and trains a model, outputs history."""

    ## Defining:
    model = models.Sequential()
    model.add(layers.Embedding(10000, 32))

    recurrent_layer_kwargs = dict()
    if model_dict["dropout"] is not None:
        recurrent_layer_kwargs["dropout"] = model_dict["dropout"]
    if model_dict["recurrent_dropout"] is not None:
        recurrent_layer_kwargs["recurrent_dropout"] = model_dict["recurrent_dropout"]

    if model_dict["layer_type"] == 'GRU':
        model.add(layers.GRU(32, **recurrent_layer_kwargs))
    elif model_dict["layer_type"] == 'LSTM':
        model.add(layers.LSTM(32, **recurrent_layer_kwargs))
    else:
        raise ValueError("Wrong model_dict['layer_type'] value...")
    model.add(layers.Dense(1, activation='sigmoid'))

    ## Compiling:
    model.compile(
        optimizer=model_dict["optimizer"],
        loss='binary_crossentropy',
        metrics=['accuracy'])

    ## Training:
    history = model.fit(x_train, y_train,
                        epochs=20,
                        batch_size=64,
                        validation_split=0.2)

    return history

## Multi-model graphs' parameters:
graph_all_nrow = 4
graph_all_ncol = 4
graph_all_figsize = (20, 20)

assert graph_all_nrow * graph_all_nrow >= len(MODELS)

## Figs and axes of multi-model graphs:
graph_all_loss_fig, graph_all_loss_axs = plt.subplots(graph_all_nrow, graph_all_ncol, figsize=graph_all_figsize)
graph_all_acc_fig, graph_all_acc_axs = plt.subplots(graph_all_nrow, graph_all_ncol, figsize=graph_all_figsize)

## Loop trough all models:
for i, model_dict in enumerate(MODELS):
    history = train_model(model_dict)

    ## Metrics extraction:
    loss = history.history['loss']
    val_loss = history.history['val_loss']
    acc = history.history['accuracy']
    val_acc = history.history['val_accuracy']

    epochs = range(1, len(loss) + 1)

    ## Single-model grph - loss:
    graph_loss_fname = fr"{os.path.basename(__file__).replace('.py', '')}"
    graph_loss_fname += fr"_v{VERSION}_{model_dict['no']}_{model_dict['name']}_loss_graph.png"

    graph_loss_fig, graph_loss_ax = plt.subplots()
    graph_loss_ax.plot(epochs, loss, 'bo', label='Training loss')
    graph_loss_ax.plot(epochs, val_loss, 'b', label='Validation loss')
    graph_loss_ax.legend()
    graph_loss_fig.suptitle("Training and validation loss")
    graph_loss_fig.savefig(graph_loss_fname)
    pylab.close(graph_loss_fig)


    ## Single-model grph - accuracy:
    graph_acc_fname = fr"{os.path.basename(__file__).replace('.py', '')}"
    graph_acc_fname += fr"_v{VERSION}_{model_dict['no']}_{model_dict['name']}_acc_graph.png"

    graph_acc_fig, graph_acc_ax = plt.subplots()
    graph_acc_ax.plot(epochs, acc, 'bo', label='Training accuracy')
    graph_acc_ax.plot(epochs, val_acc, 'b', label='Validation accuracy')
    graph_acc_ax.legend()
    graph_acc_fig.suptitle("Training and validation acc")
    graph_acc_fig.savefig(graph_acc_fname)
    pylab.close(graph_acc_fig)

    ## Position of axes on multi-model graph:
    i_row = i // graph_all_ncol
    i_col = i % graph_all_ncol

    ## Adding model metrics to multi-model graph - loss:
    graph_all_loss_axs[i_row, i_col].plot(epochs, loss, 'bo', label='Training loss')
    graph_all_loss_axs[i_row, i_col].plot(epochs, val_loss, 'b', label='Validation loss')
    graph_all_loss_axs[i_row, i_col].set_title(fr"{model_dict['no']}. {model_dict['name']}")

    ## Adding model metrics to multi-model graph - accuracy:
    graph_all_acc_axs[i_row, i_col].plot(epochs, acc, 'bo', label='Training acc')
    graph_all_acc_axs[i_row, i_col].plot(epochs, val_acc, 'b', label='Validation acc')
    graph_all_acc_axs[i_row, i_col].set_title(fr"{model_dict['no']}. {model_dict['name']}")


## Saving multi-model graphs:
# Output files are quite big (8000x8000 PNG), you may want to decrease DPI.
graph_all_loss_fig.savefig(fr"{os.path.basename(__file__).replace('.py', '')}_ALL_loss_graph.png", dpi=400)
graph_all_acc_fig.savefig(fr"{os.path.basename(__file__).replace('.py', '')}_ALL_acc_graph.png", dpi=400)

[请在下面找到两个主要图形:Loss - binary crossentropyAccuracy(由于信誉低,我不允许在图像中嵌入图像)。

我在回归模型中也获得了类似的奇怪问题-MAE在数以千计的范围内-在$ y $范围可能在数以千计的问题内。 (我决定在此处不包括此模型,因为这会使这个问题变得更长。)模块和库的版本,硬件

模块:

    Keras 2.3.1 Keras-Applications 1.0.8 Keras-Preprocessing 1.1.0 matplotlib 3.1.3 tensorflow-estimator 1.14.0 tensorflow-gpu 2.1.0 tensorflow-gpu-estimator 2.1.0
  • [keras.json文件:
    { "floatx": "float32", "epsilon": 1e-07, "backend": "tensorflow", "image_data_format": "channels_last" }
  • CUDA-我的系统上安装了CUDA 10.0和CUDA 10.1。
    CUDnn-我有三个版本:cudnn-10.0 v7.4.2.24,cudnn-10.0 v7.6.4.38,cudnn-9.0 v7.4.2.24
  • GPU:Nvidia GTX 1050Ti 4gb
  • Windows 10主页
  • 问题
  • 您知道这种行为的可能原因吗?

    这是否可能是由多个CUDA和CUDnn安装引起的?在观察到问题之前,我已经训练了几种模型(既来自书本又来自我自己的模型),并且似乎表现出预期的或更少,同时具有2个CUDA和2个CUDnn版本(以上版本中没有cudnn-10.0 v7.6.4.38)已安装。
  1. 是否有正式/良好的来源来适当组合角膜,张量流,CUDA,CUDnn(以及其他相关内容,例如Visual Studio)?我真的找不到任何权威的最新资料。
  2. 我希望我已经足够清楚地描述了所有内容。如有任何疑问,请询问。
  • 问题描述我正在经历FrançoisChollet的“ Python深度学习”(发行商网页,github上的笔记本)。复制第6章中的示例,我遇到了(I ...
  • machine-learning keras lstm recurrent-neural-network gated-recurrent-unit
    1个回答
    0
    投票
    # remotes::install_github("rstudio/keras#1032") library(keras) reticulate::py_config() #> python: /home/clanera/anaconda3/envs/r-tensorflow/bin/python #> libpython: /home/clanera/anaconda3/envs/r-tensorflow/lib/libpython3.6m.so #> pythonhome: /home/clanera/anaconda3/envs/r-tensorflow:/home/clanera/anaconda3/envs/r-tensorflow #> version: 3.6.10 |Anaconda, Inc.| (default, Jan 7 2020, 21:14:29) [GCC 7.3.0] #> numpy: /home/clanera/anaconda3/envs/r-tensorflow/lib/python3.6/site-packages/numpy #> numpy_version: 1.18.1 #> tensorflow: /home/clanera/anaconda3/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow #> #> NOTE: Python version was forced by RETICULATE_PYTHON tensorflow::tf_config() #> TensorFlow v2.0.0 (~/anaconda3/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow) #> Python v3.6 (~/anaconda3/envs/r-tensorflow/bin/python) tensorflow::tf_gpu_configured() #> TensorFlow built with CUDA: FALSE #> GPU device name: #> [1] FALSE n <- 100 t <- 80 # with 72- seams have no problem q <- 10 x <- array(sample(n*t*q), c(n, t, q)) y <- sample(0:1, n, replace = TRUE) input <- layer_input(c(t, q)) output <- input %>% # ## no problem using LSTM # layer_lstm(units = 2, recurrent_dropout = 0.5) %>% layer_gru(units = 2, recurrent_dropout = 0.5) %>% layer_dense(units = 1, activation = "sigmoid") model <- keras_model(input, output) summary(model) #> Model: "model" #> ________________________________________________________________________________ #> Layer (type) Output Shape Param # #> ================================================================================ #> input_1 (InputLayer) [(None, 80, 10)] 0 #> ________________________________________________________________________________ #> gru (GRU) (None, 2) 78 #> ________________________________________________________________________________ #> dense (Dense) (None, 1) 3 #> ================================================================================ #> Total params: 81 #> Trainable params: 81 #> Non-trainable params: 0 #> ________________________________________________________________________________ history <- model %>% compile(optimizer = "adam", loss = "binary_crossentropy") %>% fit(x, y, 2, 3) history #> Trained on 100 samples (batch_size=2, epochs=3) #> Final epoch (plot to see history): #> loss: NaN
    © www.soinside.com 2019 - 2024. All rights reserved.