Python 系列的 IMDB 分页容器

问题描述 投票:0回答:1

我正在编写这段代码: 导入请求


from bs4 import BeautifulSoup

url = 'https://www.imdb.com/title/tt5189554/episodes/'
headers = {
    "Connection": "keep-alive",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
                  "(KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36"
}
response = requests.get(url,headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

我想获取所有 80 集的集 ID 和名称等数据,但运行此代码时它只给我 50 集,其他集位于分页“30 more”下。

我尝试了很多事情,例如发现网站的 HTML 代码并找到类

<div class="sc-f09bd1f5-1 hoKmdt pagination-container">
        <span class="ipc-see-more sc-33e570c-0 cMGrFN single-page-see-more-button">
            <button class="ipc-btn ipc-btn--single-padding ipc-btn--center-align-content ipc-btn--default-height ipc-btn--core-base ipc-btn--theme-base ipc-btn--button-radius ipc-btn--on-accent2 ipc-text-button ipc-see-more__button" tabindex="151" aria-disabled="false">
                <span class="ipc-btn__text">
                    <span class="ipc-see-more__text">
                        30 more
                    </span>
                </span>
                <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" class="ipc-icon ipc-icon--expand-more ipc-btn__icon ipc-btn__icon--post" viewBox="0 0 24 24" fill="currentColor" role="presentation">
                    <path opacity=".87" fill="none" d="M24 24H0V0h24v24z"></path>
                    <path d="M15.88 9.29L12 13.17 8.12 9.29a.996.996 0 1 0-1.41 1.41l4.59 4.59c.39.39 1.02.39 1.41 0l4.59-4.59a.996.996 0 0 0 0-1.41c-.39-.38-1.03-.39-1.42 0z"></path>
                </svg>
            </button>
        </span>
    </div>

但是我找不到获取所有数据的方法

python pagination request
1个回答
0
投票

对于这种情况,最好使用“Selenium”,因为它可以向下滚动或单击“30 more”按钮

© www.soinside.com 2019 - 2024. All rights reserved.