使用 2captcha 服务和 Python Selenium/Scrapy 通过回调函数解决 Recaptcha V2 问题

问题描述 投票:0回答:4

我有一个网站我想爬行。要访问搜索结果,您必须首先使用回调函数解决 Recaptcha V2(请参见下面的屏幕截图)

带有回调函数的Recaptcha V2

我正在使用名为 2captcha 的专用验证码解算器。该服务为我提供了一个令牌,然后我将其插入回调函数以绕过验证码。我使用此 GitHub Gist 中的代码找到了回调函数,并且我能够在 Chrome Dev ToolsConsole

中成功调用该函数

可以通过键入这两个命令中的任何一个来调用该函数

window[___grecaptcha_cfg.clients[0].o.o.callback]('captcha_token')

verifyAkReCaptcha('captcha_token')

但是,当我使用

Python Selenium
中的 driver.execute_script() 方法调用这些函数时,我收到 error。我还尝试使用此方法执行 **其他标准 Javascript 函数 **(例如,向下滚动页面),但我不断收到错误。这可能是因为我尝试抓取的域阻止我使用自动化工具执行任何 Javascript。

那么我的问题是,从2captcha服务获取到token后如何调用回调函数?将不胜感激我能得到的所有帮助。提前感谢英雄(in),他/她知道如何解决这个困难的验证码。干杯!!

一些额外的信息可以帮助解决我的问题:

  1. 使用自动化框架 --> Python Seleniumscrapy。两个都适合我

  2. 错误消息 --> 错误消息 1错误消息 2

  3. 代码

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from twocaptcha import TwoCaptcha
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()

# Instantiate a solver object
solver = TwoCaptcha(os.getenv("CAPTCHA_API_KEY"))
sitekey = "6Lfwdy4UAAAAAGDE3YfNHIT98j8R1BW1yIn7j8Ka"
url = "https://suchen.mobile.de/fahrzeuge/search.html?dam=0&isSearchRequest=true&ms=8600%3B51%3B%3B&ref=quickSearch&sb=rel&vc=Car"

# Set chrome options
chrome_options = Options()
chrome_options.add_argument('start-maximized') # Required for a maximized Viewport
chrome_options.add_experimental_option('excludeSwitches', ['enable-logging', 'enable-automation'])
chrome_options.add_experimental_option("detach", True)
chrome_options.add_experimental_option('prefs', {'intl.accept_languages': 'en,en_US'})

# Instantiate a browser object and navigate to the URL
driver = webdriver.Chrome(chrome_options=chrome_options)

driver.get(url)

driver.maximize_window()

def solve(sitekey, url):
    try:
        result = solver.recaptcha(sitekey=sitekey, url=url)
    except Exception as e:
        exit(e)

    return result.get('code')

captcha_key = solve(sitekey=sitekey, url=url)
print(captcha_key)

# driver.execute_script(f"window[___grecaptcha_cfg.clients[0].o.o.callback]('{captcha_key}')") # This step fails in Python but runs successfully in the console
# driver.execute_script(f"verifyAkReCaptcha('{captcha_key}')") # This step fails in Python but runs successfully in the console
python selenium scrapy recaptcha 2captcha
4个回答
1
投票

这是适合我的代码。确保使用适合您的用例的正确选项来实例化 Chrome Web 驱动程序。

# Python imports
from twocaptcha import TwoCaptcha
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from dotenv import load_dotenv
import os
import time

# Load the environment variables
load_dotenv()

solver = TwoCaptcha(os.getenv("CAPTCHA_API_KEY"))
sitekey = "6Lfwdy4UAAAAAGDE3YfNHIT98j8R1BW1yIn7j8Ka"
base_url = "https://suchen.mobile.de/fahrzeuge/search.html"

# Define a function to solve the Captcha
def solve_captcha(sitekey, url):
    try:
        result = solver.recaptcha(sitekey=sitekey, url=url)
        captcha_key = result.get('code')
        print(f"Captcha solved. The key is: {captcha_key}\n")
    except Exception as err:
        print(err)
        print(f"Captcha not solved...")
        captcha_key = None

    return captcha_key

# Define a function to invoke the callback function
def invoke_callback_func(driver, captcha_key):
    try: # Sometimes the captcha is solved without having to invoke the callback function. This piece of code handles this situation
        # html of the captcha is inside an iframe, selenium cannot see it if we first don't switch to the iframe
        WebDriverWait(driver, 15).until(EC.frame_to_be_available_and_switch_to_it((By.ID, "sec-cpt-if")))

        # Inject the token into the inner HTML of g-recaptcha-response and invoke the callback function
        driver.execute_script(f'document.getElementById("g-recaptcha-response").innerHTML="{captcha_key}"')
        driver.execute_script(f"verifyAkReCaptcha('{captcha_key}')") # This step fails in Python but runs successfully in the console
    except TimeoutException:
        print("Captcha was solved without needing to invoke the callback function. Bypassing this part of the script to prevent raising an error")

    # Wait for 0.5 seconds until the page is loaded
    time.sleep(0.5)

# Instantiate the Chrome web driver
driver = webdriver.Chrome()

# Solve the captcha
captcha_token = solve_captcha(sitekey=sitekey, url=base_url)
# Invoke the callback function
invoke_callback_func(driver=driver, captcha_key=captcha_token)

0
投票

要解决验证码问题,我们可以使用

pyautogui
。要安装该软件包,请运行
pip install pyautogui
。使用它我们可以与屏幕上显示的内容进行交互。这意味着浏览器窗口在 python 脚本执行期间必须可见。相对于其他方法来说,这是一个很大的缺点,但从另一方面来看,它非常可靠。

在我们的例子中,我们需要单击此框enter image description here来解决验证码问题,因此我们将告诉 pyautogui 在屏幕上找到此框,然后单击它。

因此将图像保存在计算机上并命名为

box.png
。然后运行此代码(将
...
替换为您缺少的代码)。

import pyautogui
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

...

driver.get(url)
driver.maximize_window()

# html of the captcha is inside an iframe, selenium cannot see it if we first don't switch to the iframe
WebDriverWait(driver, 9).until(EC.frame_to_be_available_and_switch_to_it((By.ID, "sec-cpt-if")))

# wait until the captcha is visible on the screen
WebDriverWait(driver, 9).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#g-recaptcha')))

# find captcha on page
checkbox = pyautogui.locateOnScreen('box.png')
if checkbox:
    # compute the coordinates (x,y) of the center
    center_coords = pyautogui.center(checkbox)
    pyautogui.click(center_coords)
else:
    print('Captcha not found on screen')

0
投票

根据@sound wave的回答,我能够调用回调函数并绕过验证码而无需pyautogui。关键是使用

frame_to_be_available_and_switch_to_it
方法切换到验证码框架。感谢@sound wave 提供的惊人提示。

这是感兴趣的人的完整代码。请记住,您需要 2captcha API 密钥才能正常工作。

我仍在试图弄清楚如何在无头模式下操作此脚本,因为

WebDriverWait
对象需要Selenium处于非无头模式才能切换到验证码框架。如果有人知道如何在无头模式下使用 Selenium 时切换到验证码框架,请分享您的知识:)

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from twocaptcha import TwoCaptcha
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from dotenv import load_dotenv
import os
import time

# Load environment variables
load_dotenv()

# Instantiate a solver object
solver = TwoCaptcha(os.getenv("CAPTCHA_API_KEY"))
sitekey = "6Lfwdy4UAAAAAGDE3YfNHIT98j8R1BW1yIn7j8Ka"
url = "https://suchen.mobile.de/fahrzeuge/search.html?dam=0&isSearchRequest=true&ms=8600%3B51%3B%3B&ref=quickSearch&sb=rel&vc=Car"

# Set chrome options
chrome_options = Options()
chrome_options.add_argument('start-maximized') # Required for a maximized Viewport
chrome_options.add_experimental_option('excludeSwitches', ['enable-logging', 'enable-automation'])
chrome_options.add_experimental_option("detach", True)
chrome_options.add_experimental_option('prefs', {'intl.accept_languages': 'en,en_US'})

# Instantiate a browser object and navigate to the URL
driver = webdriver.Chrome(chrome_options=chrome_options)

driver.get(url)

driver.maximize_window()

# Solve the captcha using the 2captcha service
def solve(sitekey, url):
    try:
        result = solver.recaptcha(sitekey=sitekey, url=url)
    except Exception as e:
        exit(e)

    return result.get('code')

captcha_key = solve(sitekey=sitekey, url=url)
print(captcha_key)

# html of the captcha is inside an iframe, selenium cannot see it if we first don't switch to the iframe
WebDriverWait(driver, 9).until(EC.frame_to_be_available_and_switch_to_it((By.ID, "sec-cpt-if")))

# Inject the token into the inner HTML of g-recaptcha-response and invoke the callback function
driver.execute_script(f'document.getElementById("g-recaptcha-response").innerHTML="{captcha_key}"')
driver.execute_script(f"verifyAkReCaptcha('{captcha_key}')") # This step fails in Python but runs successfully in the console

# Wait for 3 seconds until the "Accept Cookies" window appears. Can also do that with WebDriverWait.until(EC)
time.sleep(3)

# Click on "Einverstanden"
driver.find_element(by=By.XPATH, value="//button[@class='sc-bczRLJ iBneUr mde-consent-accept-btn']").click()

# Wait for 0.5 seconds until the page is loaded
time.sleep(0.5)

# Print the top title of the page
print(driver.find_element(by=By.XPATH, value="//h1[@data-testid='result-list-headline']").text)

0
投票

通过提供不仅安全而且用户友好的创新验证码解决方案来改善在线交互。 🤖✨ https://noCaptchaAi.com https://dash.nocaptchaai.com/invite/r-dil-fssgv

#noCaptchaAi #CaptchaSolver #Captcha Ai #bypassCaptcha

© www.soinside.com 2019 - 2024. All rights reserved.