Keras 错误:调用predict_on_batch 时“优化循环失败:已取消:操作已取消”

问题描述 投票:0回答:3

我有一些使用 keras 的较旧的工作代码。我最近把它掸掉并尝试使用它,但使用的是当前版本的 keras/tensorflow。我在调用 Predict_on_batch 时收到警告/错误:

W tensorflow/core/data/root_dataset.cc:167] 优化循环失败:已取消:操作已取消

我尝试用谷歌搜索这个问题,令我惊讶的是,网络上似乎没有很好的解释导致此问题的原因或如何解决它。这是我发现的:

https://github.com/tensorflow/tensorflow/issues/48689

https://discuss.tensorflow.org/t/optimization-loop-failed-cancelled-operation-was-cancelled/1524

它列出的一个答案是确保批量大小不大于整个集合。这里的情况并非如此。

代码有点长,所以我不能轻易展示全部。这是一个深度强化学习应用程序,因此深度学习代码分为两个主要功能,我将在此处展示:

class DQN(QContract):
    def __init__(self, states, actions, lr, DDQN=False):
        self.history = []
        act_relu = activations.relu
        act_linear = activations.linear
        top_layer = 150
        middle_layer = 120
        # Create Network: Default Parameters from https://towardsdatascience.com/solving-lunar-lander-openaigym-reinforcement-learning-785675066197
        model = Sequential()
        layer = layers.Dense(top_layer, input_dim=states, activation=act_relu)
        model.add(layer)
        layer = layers.Dense(middle_layer, activation=act_relu)
        model.add(layer)
        layer = layers.Dense(actions, activation=act_linear)
        model.add(layer)
        opt = optimizers.Adam(learning_rate=lr)
        model.compile(loss='mse', optimizer=opt)
        # Create DDQN-like networks
        self.modelA = model
        #self.modelB = copy.deepcopy(model)
        self.batch_size = 100
        self.current = "A"
        self.count = 0

    def Update(self, state, action, reward, new_state, gamma, alpha=None):
        # Preform Replay
        row_count = self.batch_size
        if len(self.history) < row_count: return

        # Column names
        state = 0
        action = 1
        reward = 2
        next_state = 3
        done = 4
        # Get samples in mini-batches
        samples = random.sample(self.history, row_count)
        # Separate into separate arrays
        states_array = [sample[state] for sample in samples]
        actions_array = [sample[action] for sample in samples]
        rewards_array = [sample[reward] for sample in samples]
        next_states_array = [sample[next_state] for sample in samples]
        done_array = [sample[done] for sample in samples]
        # Turn into arrays
        states_array = np.array(states_array)
        actions_array = np.array(actions_array)
        rewards_array = np.array(rewards_array)
        next_states_array = np.array(next_states_array)
        done_array = (1.0 - np.array(done_array))

        # train on states_array
        X = states_array

        # Create y (i.e. labels for supervised learning)
        if self.current == "A":
            model1 = self.modelA
            model2 = self.modelA
        else:
            model1 = self.modelA
            model2 = self.modelA

        predicted_values = self.modelA.predict_on_batch(states_array)
        next_predicted_values = self.modelA.predict_on_batch(next_states_array)
        actual_values = rewards_array + gamma * np.amax(next_predicted_values, axis=1) * done_array

        predicted_values[list(range(row_count)), actions_array] = actual_values
        y = predicted_values

        # Update network
        self.current = "A"
        if self.current == "A":
            print('Do fit'+str(self.count))
            self.count += 1
            self.modelA.fit(X, y, epochs=1, verbose=0)
            self.current = "B"
        else:
            self.modelA.fit(X, y, epochs=1, verbose=0)
            self.current = "A"

有一次我尝试进行 DQN,但现在还不行,所以请忽略拥有两个模型的尝试。目前已禁用。

这似乎是一个相当简单的问题,但我似乎无法弄清楚。我什至尝试单步执行代码,发现单步执行调试器时并没有发生这种情况。

tensorflow keras tensorflow2.0 tf.keras
3个回答
1
投票

这个线程(OP提到的)现在有几个回复,建议添加以下行可以删除错误消息:

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpu, True)

0
投票

你检查一下RAM使用情况吗_如果超过某个点,即95%左右,就会取消操作


-1
投票

我认为这里的问题在于模型不可训练,因为如果模型的权重无法更新,优化循环就会失败。我遇到了同样的问题,我所要做的就是设置

model.trainable = True
© www.soinside.com 2019 - 2024. All rights reserved.