我正在编写这段代码: 导入请求
from bs4 import BeautifulSoup
url = 'https://www.imdb.com/title/tt5189554/episodes/'
headers = {
"Connection": "keep-alive",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36"
}
response = requests.get(url,headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
我想获取所有 80 集的集 ID 和名称等数据,但运行此代码时它只给我 50 集,其他集位于分页“30 more”下。
我尝试了很多事情,例如发现网站的 HTML 代码并找到类
<div class="sc-f09bd1f5-1 hoKmdt pagination-container">
<span class="ipc-see-more sc-33e570c-0 cMGrFN single-page-see-more-button">
<button class="ipc-btn ipc-btn--single-padding ipc-btn--center-align-content ipc-btn--default-height ipc-btn--core-base ipc-btn--theme-base ipc-btn--button-radius ipc-btn--on-accent2 ipc-text-button ipc-see-more__button" tabindex="151" aria-disabled="false">
<span class="ipc-btn__text">
<span class="ipc-see-more__text">
30 more
</span>
</span>
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" class="ipc-icon ipc-icon--expand-more ipc-btn__icon ipc-btn__icon--post" viewBox="0 0 24 24" fill="currentColor" role="presentation">
<path opacity=".87" fill="none" d="M24 24H0V0h24v24z"></path>
<path d="M15.88 9.29L12 13.17 8.12 9.29a.996.996 0 1 0-1.41 1.41l4.59 4.59c.39.39 1.02.39 1.41 0l4.59-4.59a.996.996 0 0 0 0-1.41c-.39-.38-1.03-.39-1.42 0z"></path>
</svg>
</button>
</span>
</div>
但是我找不到获取所有数据的方法
对于这种情况,最好使用“Selenium”,因为它可以向下滚动或单击“30 more”按钮