我对 Selenium、Python 有疑问。当有更多可用项时,方法 find_elements(By...) 仅返回 100 个项。这是我的代码(一部分):
url = "https://www.falstaff.com/de/listings/die-besten-restaurants-in-rheinland-pfalz"
def get_linklist(url):
try:
driver
except:
open_page(url, False)
else:
navigate(url)
try:
time.sleep(2)
#click_cookie_msg()
driver.find_element(By.XPATH, "//p[contains(text(), 'Consent')]").click()
except Exception as e:
print(repr(e))
try:
# this loop scrolls down and clicks the 'load more' function
loop_count = 0
while loop_count < 36:
time.sleep(2)
more_xpath = "//button[contains(text(), 'Weitere anzeigen')]"
more_button = driver.find_element(By.XPATH, more_xpath)
#scroll down
js_code = "arguments[0].scrollIntoView();"
driver.execute_script(js_code, more_button)
time.sleep(2)
driver.execute_script("arguments[0].click();", more_button)
loop_count += 1
except Exception as e:
print(repr(e))
try:
soup = BeautifulSoup(driver.page_source, 'html.parser')
list_items = soup.find_all('div', {'class': 'list-item'})
counter = 0
for item in list_items:
counter += 1
print('Beautifulsoup results: ' + str(counter))
except Exception as e:
pass
try:
time.sleep(2)
search_list_xpath = '//*[@id="main"]/section[4]/div/div/div/div[2]/div[3]/div[1]/div/div/div[3]'
search_list = driver.find_element(By.XPATH, search_list_xpath)
list_items = search_list.find_elements(By.CLASS_NAME, 'list-item')
counter = 0
for item in list_items:
counter += 1
print('Selenium xpath results: ' + str(counter))
except Exception as e:
pass
try:
time.sleep(2)
linklist = []
wait = WebDriverWait(driver, 20)
list_items = wait.until(EC.visibility_of_all_elements_located((By.CLASS_NAME, 'list-item')))
for list_item in list_items:
href = list_item.find_element(By.TAG_NAME, 'a').get_attribute('href')
linklist.append(href)
except Exception as e:
print(repr(e))
pass
print('Selenium class_name results: ')
print(len(linklist))
无论我多久迭代一次“加载更多”函数,该函数都会返回 100 个项目。此页面和我的浏览器的开发工具中有 700 多个可用项目。我需要获取所有 700 个项目,但 find_elements(By...) 最多只返回 100 个。我什至尝试在“加载更多”后用 beautifulsoup 抓取这个网站,但也只有 100 个项目。输出:
Beautifulsoup results: 100
Selenium xpath results: 100
Selenium class_name results:
100
这里有什么问题吗?有人可以帮助我吗?
页面一次包含的项目不超过 100 个。当您第五次按下“加载更多”按钮时,它会卸载前 20 个并再加载 20 个。
您可以在开发工具中通过使用以下 css 选择器搜索来检查有多少项
div.search-list__main div.list-item
您可以保留第一个项目名称的注释,并在单击“加载更多”5次后再次检查,然后再次检查该项目是否还在。
要获得所有物品,您必须考虑到这一点。您只需获取最后 20 项并将它们保存在列表中。然后单击“加载更多”按钮并再次执行此操作。 这样你就可以获得全部 700 件物品。