我想抓取网站https://www.rome2rio.com。下面是我想出的代码。遗憾的是,我 99% 尝试时都会看到验证码。有人可以提示我可以在代码中添加哪些内容,或者如何修改它以改进这一点并避免被检测到。
谢谢
from selenium import webdriver
import undetected_chromedriver as uc
import time
import random
# Initialize undetected ChromeOptions
chrome_options = uc.ChromeOptions()
# Essential options to avoid detection
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--incognito")
# Correctly setting excludeSwitches within undetected_chromedriver context
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
chrome_options.add_argument("--start-maximized") # To start maximized
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('useAutomationExtension', False)
# Rotating User-Agent
user_agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
# Add more as needed
]
random_user_agent = random.choice(user_agents)
chrome_options.add_argument(f"user-agent={random_user_agent}")
# Adjusting viewport size to non-standard dimensions if needed
# chrome_options.add_argument("--window-size=1366,768") # Use only if you don't want to start maximized
# Use undetected_chromedriver to avoid detection
driver = uc.Chrome(options=chrome_options)
# Open the specified website
driver.get("https://www.rome2rio.com/map/Marseille/Paris")
# Mimicking human behavior with random sleep
time.sleep(random.uniform(2, 4))
# Proceed with your script...
# Close the driver after operations are complete
driver.quit()
我相信使用 2Captcha 或其他一些验证码解决服务的 API 来解决验证码将是比尝试逃避检测更可靠的解决方案。它们可能不是免费的,但对于大多数应用程序来说,它们的定价不是问题,根据验证码类型,每 1000 个请求 1-2 美元。
您可以使用 https://github.com/seleniumbase/SeleniumBase UC 模式来避免验证码。
在
pip install seleniumbase
之后,您可以使用python
运行以下命令:
from seleniumbase import Driver
driver = Driver(uc=True)
driver.uc_open_with_reconnect("https://www.rome2rio.com/map/Marseille/Paris", 3)
driver.type('input[aria-label="From"]', "Geneva, Switzerland")
driver.type('input[aria-label="To"]', "Vienna, Austria")
driver.click('button span:contains("Search")')
breakpoint()
driver.quit()
脚本在
breakpoint()
处暂停。在控制台中输入 c
并按 Enter
从断点处继续。
有关 UC 模式的更多文档:SeleniumBase/help_docs/uc_mode.md
SeleniumBase
driver
包含所有原始 driver
方法以及新方法。