为什么代码不允许网页向下滚动?

问题描述 投票:0回答:1
from selenium import webdriver
# create webdriver instance
driver = webdriver.Chrome()
# navigate to page
driver.get('https://www.youtube.com/@freecodecamp/videos')
# scroll down to the bottom of the page

while True:
# scroll down 1000 pixels
    driver.execute_script('window.scrollBy(0, 1000)')

# wait for page to load
time.sleep(2)

# check if at bottom of page
if driver.execute_script('return window.innerHeight + window.pageYOffset \>= document.body.offsetHeight'):
break
driver.quit()

我尝试使用第一个条件来看看它是否为真。如果这意味着网页位于底部或末尾,否则它将向下滚动页面。甚至尝试使用给定网页的高度进行引用

python selenium-webdriver web-scraping dom
1个回答
0
投票

有些网站会阻止本机 JS 滚动,因此执行此功能并非在所有情况下都有效。

要向下滚动,您应该实现连续滚动 - 滚动到最后一个视频缩略图项目,并通过键盘向下滚动一点以渲染下一个项目。

解决方案类似于这个答案

from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains, Keys
from selenium.webdriver.common.by import By
from selenium import webdriver
import time

driver = webdriver.Chrome()
actionChains = ActionChains(driver)
wait = WebDriverWait(driver, 20)

def wait_for_element_location_to_be_stable(element):
    initial_location = element.location
    previous_location = initial_location
    start_time = time.time()
    while time.time() - start_time < 1:
        current_location = element.location
        if current_location != previous_location:
            previous_location = current_location
            start_time = time.time()
        time.sleep(0.4)

def continuous_scroll_with_elements(by, locator):
    while True:
        results = wait.until(EC.presence_of_all_elements_located((by, locator)))
        temp = results[-1]
        actionChains.scroll_to_element(results[-1]).perform()
        for i in range(3):
            actionChains.send_keys(Keys.ARROW_DOWN).perform()
            time.sleep(0.5)
        wait_for_element_location_to_be_stable(temp)
    
        results = wait.until(EC.presence_of_all_elements_located((by, locator)))
        if results[-1] == temp:
            break

driver.get("https://www.youtube.com/@freecodecamp/videos")

continuous_scroll_with_elements(By.CSS_SELECTOR, 'ytd-rich-grid-media[class]')
© www.soinside.com 2019 - 2024. All rights reserved.