问题是,使用此设置使用Selenium Grid时,Recaptcha V2从未解决。
是否有人遇到此问题或找到了在硒网格中使用浏览器扩展的解决方案?任何帮助将不胜感激!提前感谢!
基于您的描述,问题是,虽然您的本地硒设置可与胶囊扩展程序效果很好,但在通过webdriver.remote上运行硒网格时会失败。这是一个常见的问题,因为加载浏览器扩展远程可能是棘手的,甚至可能在某些网格配置中受到支持。
我本人还没有使用过胶囊,但是我已经成功使用了2captcha绕过recaptcha v2。我不依赖浏览器扩展名,而是通过其API集成了2Captcha,这通常在诸如Selenium Grid之类的分布式环境中更可靠。 there是您可以使用2captcha实现解决方案的方法:
缩放验证码请求: 将发布请求发送到2captcha的API(Http://2captcha.com/in.php
),使用您的API键,recaptcha的站点键和页面的URL。这将返回您的验证码任务的ID。
验证验证验解决方案: 提交后,您需要定期检查(例如,每5秒钟)是否已解决验证码。这是通过使用任务ID查询其API来完成的。注入验证码令牌: 从2captcha接收令牌后,使用JavaScript将其插入页面上隐藏的G-Recaptcha-Response字段中。有时,您可能需要触发其他事件(例如单击提交按钮),以确保页面识别该解决方案。
贝洛(Below)是Python的一个例子,证明了这种方法:
import time
import requests
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def solve_recaptcha(site_key, page_url, api_key):
session = requests.Session()
# Send the captcha solving request to 2Captcha
response = session.post("http://2captcha.com/in.php", data={
'key': api_key,
'method': 'userrecaptcha',
'googlekey': site_key,
'pageurl': page_url,
'json': 1
}).json()
if response.get('status') != 1:
raise Exception("Error submitting captcha: " + response.get('request'))
captcha_id = response.get('request')
# Poll for the solution every 5 seconds
for _ in range(20):
time.sleep(5)
result = session.get("http://2captcha.com/res.php", params={
'key': api_key,
'action': 'get',
'id': captcha_id,
'json': 1
}).json()
if result.get('status') == 1:
return result.get('request')
raise Exception("Failed to retrieve captcha solution within the expected time")
def get_driver():
chrome_options = Options()
chrome_options.add_argument("--headless=new")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
selenium_url = "http://localhost:4444" # Selenium Grid Hub URL
driver = webdriver.Remote(
command_executor=selenium_url,
options=chrome_options
)
return driver
def bypass_recaptcha(driver, site_key, page_url, api_key):
# Navigate to the page with the reCAPTCHA
driver.get(page_url)
# Retrieve the captcha solution using 2Captcha
captcha_solution = solve_recaptcha(site_key, page_url, api_key)
# Inject the solution into the hidden 'g-recaptcha-response' field
js_script = """
document.getElementById('g-recaptcha-response').style.display = 'block';
document.getElementById('g-recaptcha-response').value = arguments[0];
"""
driver.execute_script(js_script, captcha_solution)
# If needed, trigger an event (like clicking a button) to notify the page of the updated value.
# For example:
# driver.find_element(By.ID, 'submit-button').click()
if __name__ == "__main__":
API_KEY = "YOUR_2CAPTCHA_API_KEY"
SITE_KEY = "RECAPTCHA_SITE_KEY" # Replace with the actual site key from your target page
PAGE_URL = "https://example.com/page_with_recaptcha" # Replace with the URL of the page containing the captcha
driver = get_driver()
try:
bypass_recaptcha(driver, SITE_KEY, PAGE_URL, API_KEY)
# Continue with further actions after captcha bypass...
finally:
driver.quit()