我在 Rllib 中使用 PPO 算法来训练我的深度强化学习模型。训练在具有 4 个 vCPU 和 1 个 GPU (Tesla K80) 的 AWS p2.xlarge 实例上进行。我发现 PPO 不使用 GPU。
训练日志显示:
Trial status: 1 RUNNING
Current time: 2023-10-07 05:08:00. Total running time: 13s
Logical resource usage: 3.0/4 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:K80)
╭────────────────────────────────────────╮
│ Trial name status │
├────────────────────────────────────────┤
│ PPO_CartPole-v1_74ca6_00000 RUNNING │
╰────────────────────────────────────────╯
这是我的代码:
from ray import tune
from ray.rllib.algorithms.ppo import PPO
from ray.rllib.algorithms.a2c import A2C
from ray.rllib.algorithms.appo import APPO
def train() -> None:
# config training parameters
train_config = {
"env": "CartPole-v1", # MyCustomEnv_v0,
"framework": "torch",
"num_workers": 2,
"num_gpus": 1, # Add this line to specify using one GPU
"num_envs_per_worker": 1,
"model": {
"fcnet_hiddens": [512, 512, 256],
"fcnet_activation": "relu",
},
"lr": 3e-4,
"optimization": {
"optimizer": "adam",
"adam_epsilon": 1e-8,
"adam_beta1": 0.9,
"adam_beta2": 0.999,
},
"gamma": 0.99,
"num_sgd_iter": 10,
"sgd_minibatch_size": 500,
"rollout_fragment_length": 500,
"train_batch_size": 4000,
"prioritized_replay": True,
"prioritized_replay_alpha": 0.6,
"prioritized_replay_beta": 0.4,
"buffer_size": 500000,
"stop": {"episodes_total": 5000000},
"exploration_config": {},
}
stop_criteria = {"training_iteration": training_iteration}
# start to train
try:
results = tune.run(
PPO, # PPO,
config=train_config,
stop=stop_criteria,
checkpoint_at_end=True,
checkpoint_freq=50,
restore=model_restore_dir,
verbose=1,
)
except BaseException as e:
print(f"training error: {e}")
if __name__ == "__main__":
train()
首先,我针对自定义环境“MyCustomEnv_v0”进行了训练,PPO 不使用 GPU。我尝试了“CartPole-v1”,它仍然不使用 GPU。当我将算法从 PPO 更改为 APPO 后,它开始使用 GPU,A2C 也可以(我没有更改其他内容),就像这样:
Current time: 2023-10-07 05:07:01. Total running time: 0s
Logical resource usage: 3.0/4 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:K80)
╭─────────────────────────────────────────╮
│ Trial name status │
├─────────────────────────────────────────┤
│ APPO_CartPole-v1_59115_00000 PENDING │
╰─────────────────────────────────────────╯
查看了Rllib官方文档,确认PPO支持GPU训练
为什么会出现这种情况?如何在 RlLib PPO 训练中使用 GPU?
这个问题已经解决了。我在train_config中添加了“num_gpus_per_worker”:1,并且在训练中使用了GPU。像这样: