我正在尝试从该网站上抓取数据:https://www.transport.nsw.gov.au/data-and-research/drives-reporting-portal/registration-snapshot-report(如果您点击第2页(共2页),您可以看到powerBI报告)。它是一个公共网站,并且允许使用数据。
我不可能理解它,因为我不熟悉任何与网站相关的东西,而且仪表板本身需要很长时间才能手动抓取。
这是我到目前为止所得到的,但我收到超时错误:
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
chromedriver_path = "~/bin/chromedriver"
url = "https://www.transport.nsw.gov.au/data-and-research/drives-reporting-portal/registration-snapshot-report"
powerbi_iframe_selector = "iframe.mapbox"
report_page_selector = ".reportPage"
service = Service(chromedriver_path)
driver = webdriver.Chrome(service=service, options=chrome_options)
driver.get(url)
iframe = WebDriverWait(driver, 20).until(
EC.visibility_of_element_located((By.CSS_SELECTOR, powerbi_iframe_selector))
)
driver.switch_to.frame(iframe)
report_page = WebDriverWait(driver, 20).until(
EC.visibility_of_element_located((By.CSS_SELECTOR, report_page_selector))
)
time.sleep(10)
report_contents = report_page.get_attribute("innerHTML")
print(report_contents)
driver.quit()
以及错误: selenium.common.exceptions.TimeoutException:消息: 堆栈跟踪: 0 chromedriver 0x000000010689b598 chromedriver + 4973976 1 chromedriver 0x0000000106892913 chromedriver + 4938003 2 铬驱动程序
我也按照这里的一些建议尝试使用 CSSSELECTOR,但没有帮助。
有人能帮我解决这个可怕的问题吗?
我尝试过使用selenium,这是推荐的方法,但我收到了大量超时错误,我在网上查看了如何解决这些错误,但不知道。
使用CSS选择器,没有帮助: report_page = WebDriverWait(驱动程序, 20).until( EC.visibility_of_element_ located((By.CSS_SELECTOR,report_page_selector)) )
试试这个:
report_page = WebDriverWait(driver.
switch_to.frame(iframe), 20).until(
EC.visibility_of_element_located((By.CSS_SELECTOR,
report_page_selector))
我现在遇到了同样的错误,想知道您是否找到了解决方案?谢谢