Selenium driver.get()问题

问题描述 投票:1回答:1

我正在试图抓一个网站而我正在使用Selenium来帮我做,但我遇到了问题。我需要检查150页,它们的形式是"base_url&page=X"。但是,当我打电话给driver.get("base_url&page=x")时,由于某种原因,它会脱掉&page=x

当我打印链接时,它正确地显示为"base_url&page=X",但是当我点击它时会打开base_url,但如果我复制并粘贴链接,那么它会将我带到正确的页面 - "base_url&page=X"

知道问题是什么或如何修复它?

for i in range(1, 5):
    page_url = BASE_URL + "&page=" + str(i)
    parsed_site = get_page(page_url)

def get_page(url):
    DRIVER = webdriver.Chrome(chrome_options=chrome_options)
    DRIVER.get(url)
    time.sleep(2)
    data = DRIVER.page_source
    DRIVER.close()
    return BeautifulSoup(data, "html.parser")

关于后续回答的堆栈超时:

 Traceback (most recent call last):
    File "/Users/x/PycharmProjects/proj/src/scraper3.py", line 335, in <module>
       sys.exit(main())
    File "/Users/x/PycharmProjects/proj/src/scraper3.py", line 309, in main
       parsed_site = get_next_page(DRIVER, page_url) 
    File "/Users/x/PycharmProjects/proj/src/scraper3.py", line 267, in get_next_page
       DRIVER.get(url)
    File "/Users/x/PycharmProjects/proj/venv/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 324, in get
       self.execute(Command.GET, {'url': url})
    File "/Users/x/PycharmProjects/proj/venv/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 312, in execute
       self.error_handler.check_response(response)
    File "/Users/x/PycharmProjects/proj/venv/lib/python2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
       raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: timeout
   (Session info: chrome=64.0.3282.167)
   (Driver info: chromedriver=2.35.528157 (4429ca2590d6988c0745c24c8858745aaaec01ef),platform=Mac OS X 10.13.3 x86_64)
python python-2.7 selenium
1个回答
0
投票

我相信你遇到的问题是Domain要求你在开始遍历页面之前前往它并存储一些缓存数据,但是每次你下页时你都会打开一个新的driver。试试这个:

def get_page(DRIVER, url):
    DRIVER.get(url)
    time.sleep(2)
    data = DRIVER.page_source
    return BeautifulSoup(data, "html.parser")

DRIVER = webdriver.Chrome(chrome_options=chrome_options)
DRIVER.get(BASE_URL)
parsedList = []
for i in range(1, 5):
    page_url = BASE_URL + "&page=" + str(i)
    parsed_site = get_page(DRIVER, page_url)
    parsedList.append(parsed_site)
for source in parsedList: print(source)
DRIVER.quit()

编辑:

在最初的问题之后,您开始遇到当前chromedriver 2.35Chrome Build 64.的问题。这个错误的答案是here,很高兴我能提供帮助。

© www.soinside.com 2019 - 2024. All rights reserved.