类型错误:>> 不支持的操作数类型:pokerenv 中的“list”和“int”

问题描述 投票:0回答:1

我正在尝试使用 pokerenv 库进行强化学习项目,但即使文档本身提供的示例代码也会产生以下错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[7], line 13
     11 while True:
     12     action = agents[acting_player].get_action(obs)
---> 13     print(f"acion{table.step(action)} tipo{table.step(action)}")
     14     obs, reward, done, _ = table.step(action)
     15     print(f"obs={obs} tipo{type(obs)}")

File /opt/conda/lib/python3.10/site-packages/pokerenv/table.py:199, in Table.step(self, action)
    196                 self.next_player_i = min(active_players_before)
    198 if self.street_finished and not self.hand_is_over:
--> 199     self._street_transition()
    201 obs = np.zeros(self.observation_space.shape[0]) if self.hand_is_over else self._get_observation(self.players[self.next_player_i])
    202 rewards = np.asarray([player.get_reward() for player in sorted(self.players)])

File /opt/conda/lib/python3.10/site-packages/pokerenv/table.py:219, in Table._street_transition(self, transition_to_end)
    215 new = self.deck.draw(1)
    216 self.cards.append(new)
    217 self._write_event("*** TURN *** [%s %s %s] [%s]" %
    218                   (Card.int_to_str(self.cards[0]), Card.int_to_str(self.cards[1]),
--> 219                    Card.int_to_str(self.cards[2]), Card.int_to_str(self.cards[3])))
    220 self.street = GameState.TURN
    221 transitioned = True

File /opt/conda/lib/python3.10/site-packages/treys/card.py:81, in Card.int_to_str(card_int)
     79 @staticmethod
     80 def int_to_str(card_int: int) -> str:
---> 81     rank_int = Card.get_rank_int(card_int)
     82     suit_int = Card.get_suit_int(card_int)
     83     return Card.STR_RANKS[rank_int] + Card.INT_SUIT_TO_CHAR_SUIT[suit_int]

File /opt/conda/lib/python3.10/site-packages/treys/card.py:87, in Card.get_rank_int(card_int)
     85 @staticmethod
     86 def get_rank_int(card_int: int) -> int:
---> 87     return (card_int >> 8) & 0xF

TypeError: unsupported operand type(s) for >>: 'list' and 'int'

这是所需的最少代码,所有依赖项都是最新版本。

import numpy as np
import pokerenv.obs_indices as indices
from pokerenv.table import Table
from pokerenv.common import PlayerAction, Action, action_list


class ExampleRandomAgent:
    def __init__(self):
        self.actions = []
        self.observations = []
        self.rewards = []

    def get_action(self, observation):
        self.observations.append(observation)
        valid_actions = np.argwhere(observation[indices.VALID_ACTIONS] == 1).flatten()
        valid_bet_low = observation[indices.VALID_BET_LOW]
        valid_bet_high = observation[indices.VALID_BET_HIGH]
        chosen_action = PlayerAction(np.random.choice(valid_actions))
        bet_size = 0
        if chosen_action is PlayerAction.BET:
            bet_size = np.random.uniform(valid_bet_low, valid_bet_high)
        table_action = Action(chosen_action, bet_size)
        self.actions.append(table_action)
        return table_action

    def reset(self):
        self.actions = []
        self.observations = []
        self.rewards = []

active_players = 6
agents = [ExampleRandomAgent() for _ in range(6)]
player_names = {0: 'TrackedAgent1', 1: 'Agent2'} # Rest are defaulted to player3, player4...
# Should we only log the 0th players (here TrackedAgent1) private cards to hand history files
track_single_player = True 
# Bounds for randomizing player stack sizes in reset()
low_stack_bbs = 50
high_stack_bbs = 200
hand_history_location = 'hands/'
invalid_action_penalty = 0
table = Table(active_players, 
              player_names=player_names,
              track_single_player=track_single_player,
              stack_low=low_stack_bbs,
              stack_high=high_stack_bbs,
              hand_history_location=hand_history_location,
              invalid_action_penalty=invalid_action_penalty
)
table.seed(1)

iteration = 1
while True:
    if iteration % 50 == 0:
        table.hand_history_enabled = True
    active_players = np.random.randint(2, 7)
    table.n_players = active_players
    obs = table.reset()
    for agent in agents:
        agent.reset()
    acting_player = int(obs[indices.ACTING_PLAYER])
    while True:
        action = agents[acting_player].get_action(obs)
        obs, reward, done, _ = table.step(action)
        #print(f"obs={obs} tipo{type(obs)}")
        #print(f"reward={reward} tipo{type(reward)}")
        #print(f"done={done} tipo{type(done)}")
        if  done:
            # Distribute final rewards
            for i in range(active_players):
                agents[i].rewards.append(reward[i])
            break
        else:
            # This step can be skipped unless invalid action penalty is enabled, 
            # since we only get a reward when the pot is distributed, and the done flag is set
            agents[acting_player].rewards.append(reward[acting_player])
            acting_player = int(obs[indices.ACTING_PLAYER])
    iteration += 1
    table.hand_history_enabled = False

代码是文档中给出的示例的精确副本(https://pypi.org/project/pokerenv/)。此外,该错误不会立即发生,而是在环境的前 13 个步骤之后发生,因此该过程似乎在第 14 步或更高步骤失败。一般来说,故障发生在该步骤,但我见过它发生在步骤 10 或在错误发生之前一直到步骤 20。既然和动作有关,而且是随机的,难道是演员的问题?

此外,查看库的代码,card 变量被初始化为列表,特别是在初始化环境的函数中被初始化为 self.card = [] 。显然,这个变量仅根据每个函数的需要被覆盖或修改。

python openai-gym gymnasium
1个回答
0
投票

要解决此错误,请修改 Card.get_rank_int 方法。确保您使用的是整数,而不是对列表执行正确的位移位。 这可能是由您用来运行代码的库版本引起的。

Deck.draw
是(整数)而不是列表。

© www.soinside.com 2019 - 2024. All rights reserved.