RlLib PPO 训练不使用 GPU

问题描述 投票:0回答:1

我在 Rllib 中使用 PPO 算法来训练我的深度强化学习模型。训练在具有 4 个 vCPU 和 1 个 GPU (Tesla K80) 的 AWS p2.xlarge 实例上进行。我发现 PPO 不使用 GPU。

训练日志显示:

Trial status: 1 RUNNING
Current time: 2023-10-07 05:08:00. Total running time: 13s
Logical resource usage: 3.0/4 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:K80)
╭────────────────────────────────────────╮
│ Trial name                    status   │
├────────────────────────────────────────┤
│ PPO_CartPole-v1_74ca6_00000   RUNNING  │
╰────────────────────────────────────────╯

这是我的代码:

from ray import tune
from ray.rllib.algorithms.ppo import PPO
from ray.rllib.algorithms.a2c import A2C  
from ray.rllib.algorithms.appo import APPO


def train() -> None:
    # config training parameters
    train_config = {
        "env": "CartPole-v1", # MyCustomEnv_v0,
        "framework": "torch",
        "num_workers": 2,
        "num_gpus": 1,  # Add this line to specify using one GPU
        "num_envs_per_worker": 1,
        "model": {
            "fcnet_hiddens": [512, 512, 256],
            "fcnet_activation": "relu",
        },
        "lr": 3e-4,  
        "optimization": {
            "optimizer": "adam",
            "adam_epsilon": 1e-8,
            "adam_beta1": 0.9,
            "adam_beta2": 0.999,
        },  
        "gamma": 0.99,
        "num_sgd_iter": 10,  
        "sgd_minibatch_size": 500, 
        "rollout_fragment_length": 500,
        "train_batch_size": 4000,
        "prioritized_replay": True,
        "prioritized_replay_alpha": 0.6,
        "prioritized_replay_beta": 0.4, 
        "buffer_size": 500000,
        "stop": {"episodes_total": 5000000},
        "exploration_config": {},
    }
    stop_criteria = {"training_iteration": training_iteration}

    # start to train
    try:
        results = tune.run(
            PPO, # PPO,
            config=train_config,
            stop=stop_criteria,
            checkpoint_at_end=True,
            checkpoint_freq=50, 
            restore=model_restore_dir,
            verbose=1,
        )
    except BaseException as e:
        print(f"training error: {e}")
    
if __name__ == "__main__":
    train()

首先,我针对自定义环境“MyCustomEnv_v0”进行了训练,PPO 不使用 GPU。我尝试了“CartPole-v1”,它仍然不使用 GPU。当我将算法从 PPO 更改为 APPO 后,它开始使用 GPU,A2C 也可以(我没有更改其他内容),就像这样:

Current time: 2023-10-07 05:07:01. Total running time: 0s
Logical resource usage: 3.0/4 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:K80)
╭─────────────────────────────────────────╮
│ Trial name                     status   │
├─────────────────────────────────────────┤
│ APPO_CartPole-v1_59115_00000   PENDING  │
╰─────────────────────────────────────────╯

查看了Rllib官方文档,确认PPO支持GPU训练

为什么会出现这种情况?如何在 RlLib PPO 训练中使用 GPU?

deep-learning pytorch gpu rllib
1个回答
0
投票

这个问题已经解决了。我在train_config中添加了“num_gpus_per_worker”:1,并且在训练中使用了GPU。像这样:

© www.soinside.com 2019 - 2024. All rights reserved.