我想在使用 SB3 的同时记录训练期间
observation
之后获得的每个 reset
。
基于 this 问题消息,我决定使用
Monitor
包装器而不是回调。
但是,
Monitor
包装器给了我一个错误。
这是我的代码 -
import gym
from stable_baselines3 import PPO
from stable_baselines3.common.callbacks import BaseCallback
from stable_baselines3.common.monitor import Monitor
class CustomMonitor(Monitor):
def __init__(self, env, filename=None, allow_early_resets=True, reset_keywords=(), info_keywords=()):
super(CustomMonitor, self).__init__(env)
self.reset_observations = []
def reset(self, **kwargs):
observation = super(CustomMonitor, self).reset(**kwargs)
self.reset_observations.append(observation)
return observation
env = gym.make('LunarLander-v2')
env = CustomMonitor(env)
model = PPO('MlpPolicy', env, verbose=1)
# Train the model
model.learn(total_timesteps=1000000)
# Save the model
model.save("ppo_lunarlander_mutant")
但是,运行后,我收到以下错误 -
Traceback (most recent call last):
File "minimal_example.py", line 21, in <module>
model = PPO('MlpPolicy', env, verbose=1)
File "/home/thoma/anaconda3/envs/wp/lib/python3.8/site-packages/stable_baselines3/ppo/ppo.py", line 109, in __init__
super().__init__(
File "/home/thoma/anaconda3/envs/wp/lib/python3.8/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 85, in __init__
super().__init__(
File "/home/thoma/anaconda3/envs/wp/lib/python3.8/site-packages/stable_baselines3/common/base_class.py", line 180, in __init__
assert isinstance(self.action_space, supported_action_spaces), (
AssertionError: The algorithm only supports (<class 'gymnasium.spaces.box.Box'>, <class 'gymnasium.spaces.discrete.Discrete'>, <class 'gymnasium.spaces.multi_discrete.MultiDiscrete'>, <class 'gymnasium.spaces.multi_binary.MultiBinary'>) as action spaces but Discrete(4) was provided
我应该使用
gymnasium
而不是 gym
。从以下错误中应该可以看出这一点 -
AssertionError: The algorithm only supports (<class 'gymnasium.spaces.box.Box'>, <class 'gymnasium.spaces.discrete.Discrete'>, <class 'gymnasium.spaces.multi_discrete.MultiDiscrete'>, <class 'gymnasium.spaces.multi_binary.MultiBinary'>) as action spaces but Discrete(4) was provided
也许旧版本的
stable_baselines3
可以与gym
一起使用,这需要进一步调查