我的 alpha beta 搜索算法对于终极 tic tac toe AI 机器人来说很慢

Question

我正在做一个学校项目，我试图编写一个 alpha beta 搜索算法来解决终极井字游戏。（终极井字棋只是普通井字棋的 3x3 网格，您放置的每个动作，下一个动作（因此对手）都会被发送到该网格位置。

它的问题不是“无限”循环（基本上分支因子太大），我必须限制搜索深度，所以这不是一个很好的解决方案。目前，如果我让它的搜索深度超过 4，它基本上就没用了。谁能告诉我更好的算法方法，或者任何改进，我真的很感激，因为我是人工智能算法领域的新手。

仅供参考，它可以赢得比赛，但只能对抗随机机器人（随机放置的机器人）。与其他人工智能机器人相比，它基本上总是失败。我目前正在实现一个启发式函数来排序动作，但我不太确定这是否能解决我的问题。

下面是我的代码中重要函数的片段。

基本上每次轮到我们运行时都会调用 play，它会运行 alpha beta 搜索并返回具有最大结果的移动。

# choose a move to play - this is called everytime its the bots move
def play(max_rec_depth=4):

    n = execute_alp_bta(max_rec_depth)
    
    place(curr, n, 1)
    return n
    
    
# Call to execute
def execute_alp_bta(max_rec_depth) -> int:
    global curr
    
    max_pos = -1 
    max_val = float('-inf') 
    
    moves = get_heuristic_ordered_moves(curr)
    # Iterate over possible moves on the current board
    for i in range(1, 10):
        if boards[curr][i] == 0:
            boards[curr][i] = 1  # Simulate place Maximiser (us)
            
            # Recurse starting from opponents view
            score = minimax(max_rec_depth, float('-inf'), float('inf'), is_maximising_player=False, curr_val=i)
            boards[curr][i] = 0  # Undo the simulated move
            
            # Update the best score and position if the current score is better
            if score > max_val:
                max_val = score
                max_pos = i

    # Return the position that gives the maximum value as per minimax
    if max_pos == -1:
        raise ValueError("Error with Minimax recursion")
    
    # print(f"found max val of {max_val} for pos {max_pos}")
    return max_pos


# MAXIMISING PLAYER - should be the US the computer
def minimax(depth, alpha, beta, is_maximising_player, curr_val) -> int:
    eval = evaluate() # returns win loss draw or none
    if depth == 0 or abs(eval) == 1:  # Terminal condition
        return eval
    
    if is_maximising_player:
        max_val = float('-inf')
        for i in range(1, 10):
            if boards[curr_val][i] == 0:
                boards[curr_val][i] = 1 # Maximisers move
                # print(f"placed at board: {curr_val} with index {i}")
                score = minimax(depth-1, alpha, beta, False, curr_val=i)
                # print(f"exited back out of recursion. Undid board: {curr_val} with index {i}")
                boards[curr_val][i] = 0 # Undo move
                max_val = max(max_val, score)
                alpha = max(alpha, score)
                if beta <= alpha:
                    break
            
        return max_val
    else: 
        min_val = float('inf')
        for i in range(1, 10):
            if boards[curr_val][i] == 0:
                boards[curr_val][i] = -1 # minimizers move
                # print(f"placed at board: {curr_val} with index {i}")
                score = minimax(depth-1, alpha, beta, True, curr_val=i)
                # print(f"exited back out of recursion. Undid board: {curr_val} with index {i}")
                boards[curr_val][i] = 0 # undo move
                min_val = min(min_val, score)
                beta = min(beta, score)
                if beta <= alpha:
                    break
        
            
        return min_val

Answer 1

1- 在井字棋中，有些情况下开始游戏的玩家会获胜。例如，在 3x3 棋盘中，如果 X 开始游戏并填满两个角，则他获胜；或者至少会有平局。所以对手必须想办法打平，他永远不可能赢。如果你的代码在这种情况下失败了，那没关系，但如果你在更简单的情况下失败，则意味着你的代码有错误。

2- 在这个游戏中，你的代码必须检查每个动作的所有未来动作，所以当你的棋盘是 10x10 时，这需要时间是正常的

3-当棋盘为空时，你还没有设置任何起点，你必须开始游戏。您的代码会检查所有可能的移动，但这不是必需的。您可以指定代码在棋盘为空时选择的点（例如 (0,0)）。

4-在minimax()函数的“else”部分，计算分数后添加一个条件：

score = minimax(depth-1, alpha, beta, True, curr_val=i)
if score == -1:
    return -1

因为当你的对手找到一种获胜的方法（分数==-1）时，他绝对会使用它。所以你不需要检查更多的动作。

5-如果您的代码有可能是开始游戏的玩家或第二个玩家，则您的 eval() 函数需要知道您是第一个玩家还是第二个玩家。您需要一个在每次移动时返回当前玩家的函数，并且您应该在第一次调用 minimax 时调用它，以便它返回您的一方（第一个玩家或第二个玩家（X 或 O））。（玩家为 X 和 O，棋盘为 3x3）

def player(board):
    # Returns player who has the next turn on a board.
    if sum(row.count(None) for row in board) == 9:
        return X
    elif terminal(board) == True:
        return "end"
    else:
        if sum(row.count(X) for row in board) > sum(row.count(O) for row in board):
            return O
        else:
            return X

你需要一个全局变量AI-side，并在第一次调用minimax时初始化它。

AI_side = player(board)

在 eval() 中使用 AI 端来识别应该获胜的玩家。

希望能帮到你。

我的 alpha beta 搜索算法对于终极 tic tac toe AI 机器人来说很慢

问题描述投票：0回答：1

1个回答

最新问题

我的 alpha beta 搜索算法对于终极 tic tac toe AI 机器人来说很慢

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1