我正在尝试使用此代码向下滚动到页面末尾:
from selenium import webdriver
url = 'http://www.tradingview.com/screener'
driver = webdriver.Firefox()
driver.get(url)
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# will give a list of all tickers
tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol')
for index in range(len(tickers)):
print("Row " + tickers[index].text + " ")
但是
while
循环永远不会结束;即使到达页面底部,Selenium 仍会继续尝试向下滚动,因此程序不会继续进行。如何检测已到达页面底部以便代码可以继续?
在代码下方,它告诉您表中有多少行(匹配项)。因此,一种选择是将可见行数与总行数进行比较。当达到该数量(可见行数)时,您将退出循环。
url = 'http://www.tradingview.com/screener'
driver = webdriver.Firefox()
driver.get(url)
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
selector = '.js-field-total.tv-screener-table__field-value--total'
matches = driver.find_element_by_css_selector(selector)
matches = int(matches.text.split()[0])
visible_rows = 0
scrolls = 0
while visible_rows < matches:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait 10 scrolls before updating row information
if scrolls == 10:
table = driver.find_elements_by_class_name('tv-data-table__tbody')
visible_rows = len(table[1].find_elements_by_tag_name('tr'))
scrolls = 0
scrolls += 1
# will give a list of all tickers
tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol')
for index in range(len(tickers)):
print("Row " + tickers[index].text + " ")
编辑:由于您的设置似乎不允许使用以前的解决方案,因此您可以尝试以下不同的方法。该页面一次加载 150 行。因此,我们可以使用预期的总匹配数/行数(例如 4894)并将其除以 150 来获得需要滚动的次数,而不是计算可见行数。如果我们滚动至少那么多次,理论上,所有行都应该可见,我们可以继续代码。
from time import sleep
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
url = 'http://www.tradingview.com/screener'
driver = webdriver.Chrome('./chromedriver')
driver.get(url)
try:
selector = '.js-field-total.tv-screener-table__field-value--total'
condition = EC.visibility_of_element_located((By.CSS_SELECTOR, selector))
matches = WebDriverWait(driver, 10).until(condition)
matches = int(matches.text.split()[0])
except (TimeoutException, Exception):
print ('Problem finding matches, setting default...')
matches = 4895 # Set default
# The page loads 150 rows at a time; divide matches by
# 150 to determine the number of times we need to scroll;
# add 5 extra scrolls just to be sure
num_loops = int(matches / 150 + 5)
for _ in range(num_loops):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
sleep(2) # Pause briefly to allow loading time
# will give a list of all tickers
tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol')
n_tickers = len(tickers)
msg = 'Correct ' if n_tickers == matches else 'Incorrect '
msg += 'number of tickers ({}) found'
print(msg.format(n_tickers))
for index in range(n_tickers):
print("Row " + tickers[index].text + " ")