如何使用Selenium + Python实现连续滚动

问题描述 投票:0回答:1

在Python中使用Selenium,我想从该网页加载整个JS生成的列表:https://partechpartners.com/companies。底部有一个“加载更多”按钮。

我编写的按下按钮的代码(目前只执行一次,我知道我需要扩展它才能使用

while
多次执行此操作):

from selenium import webdriver #The Selenium webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException, StaleElementReferenceException, WebDriverException
from time import sleep

chrome_options = Options()
chrome_options.add_argument("--headless")

driver = webdriver.Chrome(options=chrome_options)

url = 'https://partechpartners.com/companies'

driver.get(url)

sleep(2)

load_more = driver.find_element('xpath','//*[ text() = "LOAD MORE"]')

sleep(2)

try:
    ActionChains(driver).move_to_element(load_more).click(load_more).perform()
    print("Element was clicked")
except Exception as e:
    print("Element wasn't clicked")

代码返回

Element was clicked
。但是,当我将以下代码添加到上述脚本的底部时,我只返回 30 个项目,这是未单击按钮时的数字,并且相对 Xpath 对于按钮单击前和按钮后的元素是相同的,所以我知道不是那样的:

len(driver.find_elements('xpath','//h2'))

我还尝试注释掉

chrome_options.add_argument("--headless")
,看看它是否不能作为无头浏览器工作并跟踪点击。出现一个我无法删除的接受 cookies 按钮,但这似乎并不重要,因为当我运行上面的脚本时它仍然返回元素。我可以做什么来确保 webdriver 浏览器实际加载页面?

python selenium-webdriver
1个回答
0
投票

只要直接点击后不等待任何东西,你会得到相同的结果。

在您的情况下,您可以注意滚动后发生的情况:

  • 加载更多按钮更改其位置
  • 增加的物品数量
  • 当所有项目加载完毕后,按钮消失

因此,点击后您可以等待数量项目增加或按钮更改其位置(或两种情况)

并在

while
循环中重复此操作,直到加载更多按钮消失。

位置示例:

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import time

chrome_options = Options()
driver = webdriver.Chrome(options=chrome_options)

def wait_for_element_location_to_be_stable(element):
    initial_location = element.location
    previous_location = initial_location
    start_time = time.time()
    while time.time() - start_time < 1:
        current_location = element.location
        if current_location != previous_location:
            previous_location = current_location
            start_time = time.time()
        time.sleep(0.4)

def get_shadow_root(element):
    return driver.execute_script('return arguments[0].shadowRoot', element)

url = 'https://partechpartners.com/companies'

driver.get(url)
timeout = 20
wait = WebDriverWait(driver, timeout)

#accept consent
shadow_host = wait.until(EC.presence_of_element_located((By.ID, 'usercentrics-root')))
shadow_container = get_shadow_root(shadow_host).find_element(By.CSS_SELECTOR, '[data-testid=uc-app-container]')
WebDriverWait(shadow_container, timeout).until(EC.presence_of_element_located((By.CSS_SELECTOR, '[data-testid=uc-accept-all-button]'))).click()
wait.until(EC.invisibility_of_element_located((By.ID, 'usercentrics-root')))

#scroll logic
load_more_xpath = "//*[text()='LOAD MORE']"
load_more = wait.until(EC.visibility_of_element_located((By.XPATH, load_more_xpath)))

while(len(driver.find_elements(By.XPATH, load_more_xpath)) > 0):
    wait_for_element_location_to_be_stable(load_more)
    ActionChains(driver).move_to_element(load_more).click(load_more).perform()

titles = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '[id*=id-] h2')))
for title in titles:
    print(title.text)

© www.soinside.com 2019 - 2024. All rights reserved.