我正在尝试使用 pokerenv 库进行强化学习项目,但即使文档本身提供的示例代码也会产生以下错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[7], line 13
11 while True:
12 action = agents[acting_player].get_action(obs)
---> 13 print(f"acion{table.step(action)} tipo{table.step(action)}")
14 obs, reward, done, _ = table.step(action)
15 print(f"obs={obs} tipo{type(obs)}")
File /opt/conda/lib/python3.10/site-packages/pokerenv/table.py:199, in Table.step(self, action)
196 self.next_player_i = min(active_players_before)
198 if self.street_finished and not self.hand_is_over:
--> 199 self._street_transition()
201 obs = np.zeros(self.observation_space.shape[0]) if self.hand_is_over else self._get_observation(self.players[self.next_player_i])
202 rewards = np.asarray([player.get_reward() for player in sorted(self.players)])
File /opt/conda/lib/python3.10/site-packages/pokerenv/table.py:219, in Table._street_transition(self, transition_to_end)
215 new = self.deck.draw(1)
216 self.cards.append(new)
217 self._write_event("*** TURN *** [%s %s %s] [%s]" %
218 (Card.int_to_str(self.cards[0]), Card.int_to_str(self.cards[1]),
--> 219 Card.int_to_str(self.cards[2]), Card.int_to_str(self.cards[3])))
220 self.street = GameState.TURN
221 transitioned = True
File /opt/conda/lib/python3.10/site-packages/treys/card.py:81, in Card.int_to_str(card_int)
79 @staticmethod
80 def int_to_str(card_int: int) -> str:
---> 81 rank_int = Card.get_rank_int(card_int)
82 suit_int = Card.get_suit_int(card_int)
83 return Card.STR_RANKS[rank_int] + Card.INT_SUIT_TO_CHAR_SUIT[suit_int]
File /opt/conda/lib/python3.10/site-packages/treys/card.py:87, in Card.get_rank_int(card_int)
85 @staticmethod
86 def get_rank_int(card_int: int) -> int:
---> 87 return (card_int >> 8) & 0xF
TypeError: unsupported operand type(s) for >>: 'list' and 'int'
这是所需的最少代码,所有依赖项都是最新版本。
import numpy as np
import pokerenv.obs_indices as indices
from pokerenv.table import Table
from pokerenv.common import PlayerAction, Action, action_list
class ExampleRandomAgent:
def __init__(self):
self.actions = []
self.observations = []
self.rewards = []
def get_action(self, observation):
self.observations.append(observation)
valid_actions = np.argwhere(observation[indices.VALID_ACTIONS] == 1).flatten()
valid_bet_low = observation[indices.VALID_BET_LOW]
valid_bet_high = observation[indices.VALID_BET_HIGH]
chosen_action = PlayerAction(np.random.choice(valid_actions))
bet_size = 0
if chosen_action is PlayerAction.BET:
bet_size = np.random.uniform(valid_bet_low, valid_bet_high)
table_action = Action(chosen_action, bet_size)
self.actions.append(table_action)
return table_action
def reset(self):
self.actions = []
self.observations = []
self.rewards = []
active_players = 6
agents = [ExampleRandomAgent() for _ in range(6)]
player_names = {0: 'TrackedAgent1', 1: 'Agent2'} # Rest are defaulted to player3, player4...
# Should we only log the 0th players (here TrackedAgent1) private cards to hand history files
track_single_player = True
# Bounds for randomizing player stack sizes in reset()
low_stack_bbs = 50
high_stack_bbs = 200
hand_history_location = 'hands/'
invalid_action_penalty = 0
table = Table(active_players,
player_names=player_names,
track_single_player=track_single_player,
stack_low=low_stack_bbs,
stack_high=high_stack_bbs,
hand_history_location=hand_history_location,
invalid_action_penalty=invalid_action_penalty
)
table.seed(1)
iteration = 1
while True:
if iteration % 50 == 0:
table.hand_history_enabled = True
active_players = np.random.randint(2, 7)
table.n_players = active_players
obs = table.reset()
for agent in agents:
agent.reset()
acting_player = int(obs[indices.ACTING_PLAYER])
while True:
action = agents[acting_player].get_action(obs)
obs, reward, done, _ = table.step(action)
#print(f"obs={obs} tipo{type(obs)}")
#print(f"reward={reward} tipo{type(reward)}")
#print(f"done={done} tipo{type(done)}")
if done:
# Distribute final rewards
for i in range(active_players):
agents[i].rewards.append(reward[i])
break
else:
# This step can be skipped unless invalid action penalty is enabled,
# since we only get a reward when the pot is distributed, and the done flag is set
agents[acting_player].rewards.append(reward[acting_player])
acting_player = int(obs[indices.ACTING_PLAYER])
iteration += 1
table.hand_history_enabled = False
代码是文档中给出的示例的精确副本(https://pypi.org/project/pokerenv/)。此外,该错误不会立即发生,而是在环境的前 13 个步骤之后发生,因此该过程似乎在第 14 步或更高步骤失败。一般来说,故障发生在该步骤,但我见过它发生在步骤 10 或在错误发生之前一直到步骤 20。既然和动作有关,而且是随机的,难道是演员的问题?
此外,查看库的代码,card 变量被初始化为列表,特别是在初始化环境的函数中被初始化为 self.card = [] 。显然,这个变量仅根据每个函数的需要被覆盖或修改。
要解决此错误,请修改 Card.get_rank_int 方法。确保您使用的是整数,而不是对列表执行正确的位移位。 这可能是由您用来运行代码的库版本引起的。
Deck.draw
是(整数)而不是列表。