体育馆自定义环境“太多值无法解包”错误

Question

我正在尝试使用具有健身房和稳定基线的自定义 boid 植绒环境。我有一个自定义策略和培训循环。
我的行动和观察空间如下：

min_action = np.array([-5, -5] * len(self.agents), dtype=np.float32)
max_action = np.array([5, 5] * len(self.agents), dtype=np.float32)

min_obs = np.array([-np.inf, -np.inf, -2.5, -2.5] * len(self.agents), dtype=np.float32)
max_obs = np.array([np.inf, np.inf, 2.5, 2.5] * len(self.agents), dtype=np.float32)

培训代码：

import numpy as np
import torch as th
from Parameters import *
from stable_baselines3 import PPO
from main import FlockingEnv, CustomMultiAgentPolicy
from Callbacks import TQDMProgressCallback, LossCallback
import os
from stable_baselines3.common.vec_env import DummyVecEnv


if os.path.exists(Results["Rewards"]):
    os.remove(Results["Rewards"])
    print(f"File {Results['Rewards']} has been deleted.")

if os.path.exists("training_rewards.json"):
    os.remove("training_rewards.json")
    print(f"File training_rewards has been deleted.")    

def seed_everything(seed):
    np.random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    th.manual_seed(seed)
    th.cuda.manual_seed(seed)
    th.backends.cudnn.deterministic = True
    env.seed(seed)
    env.action_space.seed(seed)


loss_callback = LossCallback()
env = DummyVecEnv([lambda: FlockingEnv()])

seed_everything(SimulationVariables["Seed"])

# # Model Training
model = PPO(CustomMultiAgentPolicy, env, tensorboard_log="./ppo_Agents_tensorboard/", verbose=1)
model.set_random_seed(SimulationVariables["ModelSeed"])
progress_callback = TQDMProgressCallback(total_timesteps=SimulationVariables["LearningTimeSteps"])
# Train the model
model.learn(total_timesteps=SimulationVariables["LearningTimeSteps"], callback=[progress_callback, loss_callback])

错误：

Using cuda device
Traceback (most recent call last):
File "D:\Thesis_\FlockingFinal\MultiAgentFlocking\Training.py", line 45, in <module>
model.learn(total_timesteps=SimulationVariables["LearningTimeSteps"], callback=[progress_callback, loss_callback])      
File "C:\Python312\Lib\site-packages\stable_baselines3\ppo\ppo.py", line 315, in learn
return super().learn(
            ^^^^^^^^^^^^^^
File "C:\Python312\Lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 287, in learn
total_timesteps, callback = self._setup_learn(
                                 ^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\site-packages\stable_baselines3\common\base_class.py", line 423, in _setup_learn
self._last_obs = self.env.reset()  # type: ignore[assignment]
                      ^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\site-packages\stable_baselines3\common\vec_env\dummy_vec_env.py", line 77, in reset
obs, self.reset_infos[env_idx] = self.envs[env_idx].reset(seed=self._seeds[env_idx], **maybe_options)
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: too many values to unpack (expected 2)

我也使用了与gym类似的播种功能，但没有错误，我认为这可能会导致错误，但即使我不使用它，错误也不会消失。

Answer 1

Stable Baselines3 环境接口期望重置函数返回一个元组：（观察，信息）。

原代码： `def 重置（自身，种子=无）： # 如果提供了种子，请在此处设置如果种子不是 None： self.seed(种子)

    self.agents = [Agent(position) for position in self.read_agent_locations()]

    for agent in self.agents:
        agent.acceleration = np.zeros(2)
        agent.velocity = np.round(np.random.uniform(-SimulationVariables["VelocityUpperLimit"], SimulationVariables["VelocityUpperLimit"], size=2), 2)

    observation = self.get_observation().flatten()

    ################################
    self.current_timestep = 0  # Reset time step count
    ################################
    
    return observation`

错误： 仅返回观察结果，这导致框架抛出解包错误。

已调试： 修改了自定义环境的重置方法以返回观察结果和空信息字典。

复位功能： ` def重置（自身，种子=无，选项=无）： # 如果提供了种子，请在此处设置如果种子不是 None： self.seed(种子)

    self.agents = [Agent(position) for position in self.read_agent_locations()]

    for agent in self.agents:
        agent.acceleration = np.zeros(2)
        agent.velocity = np.round(np.random.uniform(-SimulationVariables["VelocityUpperLimit"], SimulationVariables["VelocityUpperLimit"], size=2), 2)

    observation = self.get_observation().flatten()

    ################################
    self.current_timestep = 0  # Reset time step count
    ################################

    super().reset(seed=seed)
    info = {}  # This is the extra information dictionary, you can populate it with useful info if needed
    return observation, info `

体育馆自定义环境“太多值无法解包”错误

问题描述投票：0回答：1

1个回答

最新问题

体育馆自定义环境“太多值无法解包”错误

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1