无法使用 Selenium 从延迟加载表中抓取所有数据

问题描述 投票:0回答:1

我正在尝试从位于

网页
中间的表格中抓取三个字段(
player
logo
dkprice)。要查看该表中的所有数据,需要向下滚动到其底部。

我在

selenium
中创建了一个脚本,可以将表格内容滚动到底部,但只能抓取最后 16 个结果。然而,表中有 240 项。

我的目标是使用

selenium
抓取表格的所有内容,因为我已经使用
requests
模块成功抓取了内容。我想知道为什么即使滚动到底部,Selenium 仍然无法解析该表的所有内容。

我发现使用请求模块成功了:

import requests

link = 'https://fantasyteamadvice.com/api/user/get-ownership'

res = requests.post(link,json={"sport":"mlb"})
for item in res.json()['ownership']:
    print(item['fullname'],item['team'],item['dkPrice'])

使用 Selenium 构建的脚本只能解析最后 16 项:

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

link = 'https://fantasyteamadvice.com/dfs/mlb/ownership'

def get_content(driver,link):
    driver.get(link)
    scroll_to_get_more(driver)
    for elem in WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,".ownership-table-container [class$='player-row']"))):
        player = elem.find_element(By.CSS_SELECTOR,"[data-testid='ownershipPlayer']").text
        logo = elem.find_element(By.CSS_SELECTOR,"[data-testid='ownershipPlayerTeam'] > img").get_attribute("alt")
        dkprice = elem.find_element(By.CSS_SELECTOR,"[data-testid='ownershipPlayerDkPrice']").text
        yield player,logo,dkprice


def scroll_to_get_more(driver):
    last_elem = ''
    while True:
        current_elem = WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".ownership-table-container [class$='player-row']:last-child")))
        driver.execute_script("arguments[0].scrollIntoView();", current_elem)
        time.sleep(3) # wait for page to load new content
        if (last_elem == current_elem):
           break
        else:
           last_elem = current_elem


if __name__ == '__main__':
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
    try:
        for item in get_content(driver,link):
            print(item)
    finally:
        driver.quit()

如何使用 Selenium 抓取该延迟加载表的所有数据?

python python-3.x selenium-webdriver web-scraping
1个回答
0
投票

您可以从以下请求的响应中获取表中的所有数据: enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.