无法访问网络选项卡下 XHR 选项卡下的预览选项卡

问题描述 投票:0回答:1

我一直在尝试编写代码来打印网络选项卡下的 Fetch/XHR 选项卡下的预览。图像如下所示- enter image description here

在此图像中,我想在选择验证码下的音频按钮后在控制台上打印。

网站链接是- https://tmrsearch.ipindia.gov.in/eregister/

首先在此网站上,您必须选择左侧第一个按钮“商标申请/注册商标”

之后,选择国家/IRDI 号码复选框以转到所需页面

enter image description here

所需页面是- enter image description here

检查元素后,当您单击验证码下方的音频按钮时,您将在网络选项卡下的 XHR/Fetch 选项卡下的预览选项卡中看到验证码。验证码将出现在预览选项卡中。

我已经创建了一个Python代码,可以进入所需的页面。

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver=webdriver.Chrome()
driver.maximize_window()
driver.get("https://tmrsearch.ipindia.gov.in/eregister/")
wait = WebDriverWait(driver, 10)

# switch into the frame context
wait.until(EC.frame_to_be_available_and_switch_to_it((By.NAME, "eregoptions")))

# click on the targeted element
wait.until(EC.element_to_be_clickable((By.ID, "btnviewdetails"))).click()

# come out of frame
driver.switch_to.default_content()
time.sleep(10)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.NAME, "showframe")))

# click on the targeted element
wait.until(EC.element_to_be_clickable((By.ID, "rdb_0"))).click()

# come out of frame
driver.switch_to.default_content()
time.sleep(10)

我想添加几行代码,可以在“网络”选项卡下的“Fetch/XHR”下的“预览”选项卡下打印验证码。

selenium-webdriver web-scraping webdriver backend
1个回答
0
投票

我们可以通过两种方式获取验证码值。

  1. 使用硒线
  2. 使用请求库

硒线:

  1. 使用 seleniumwire,我们可以跟踪后端 api 调用及其响应。
  2. 单击音频按钮以显示验证码。
  3. 迭代所有请求并获取验证码值

请求库:

  1. 像往常一样用selenium加载页面并从主页获取SessionId
  2. 通过传递 SessionId 和其他必要的标头来发出 POST 请求以获取验证码值。

我将这两种方法都包含在代码中。看看并使用您觉得舒服的那个。

import time
from seleniumwire import webdriver
#from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import json
import requests

# This method will capture captcha value from the network calls
def get_captcha_details_using_selemiumwire(driver):
    all_req = driver.requests
    for request in all_req:
        url = request.url
        if (url and 'GetCaptcha' in url):
            json_str = request.response.body.decode("utf-8")
            data = json.loads(json_str)
            captcha = data['d']
            print(f"Captcha using seleniumwire: {captcha}")
            return captcha

# This method will make a request with session and get the captcha value
def get_captcha_details_using_requests(session_id):
    headers = {
        'Accept': 'application/json, text/javascript, */*; q=0.01',
        'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8',
        'Connection': 'keep-alive',
        'Content-Type': 'application/json; charset=UTF-8',
        'Cookie': f"ASP.NET_SessionId={session_id}",
        'Origin': 'https://tmrsearch.ipindia.gov.in',
        'Referer': 'https://tmrsearch.ipindia.gov.in/eregister/Application_View.aspx',
        'Sec-Fetch-Dest': 'empty',
        'Sec-Fetch-Mode': 'cors',
        'Sec-Fetch-Site': 'same-origin',
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36',
        'X-Requested-With': 'XMLHttpRequest',
        'sec-ch-ua': '"Not/A)Brand";v="8", "Chromium";v="126"',
        'sec-ch-ua-mobile': '?0',
        'sec-ch-ua-platform': '"Linux"',
    }

    json_data = {}

    response = requests.post(
        'https://tmrsearch.ipindia.gov.in/eregister/Viewdetails_Copyright.aspx/GetCaptcha',
        headers=headers,
        json=json_data,
        verify=False,
    )

    data_hash = response.json()
    captcha = data_hash["d"]
    print(f"Captcha using requests library: {captcha}")
    return captcha
    

# Adding seleniumwire options and initializing the driver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--disable-http2')
seleniumwire_options = {
        'mitm_http2': False
}
driver = webdriver.Chrome(options=chrome_options, seleniumwire_options=seleniumwire_options)

# Initializing selenium driver
#driver=webdriver.Chrome()

driver.maximize_window()
driver.get("https://tmrsearch.ipindia.gov.in/eregister/")
wait = WebDriverWait(driver, 10)

# Taking session_id from main page to use it in captcha request
cookies_list = driver.get_cookies()
for cookie in cookies_list:
    name = cookie["name"]
    if (name and "ASP.NET_SessionId" in name):
        session_id = cookie["value"]
        break
print(f"Session ID: {session_id}")

# switch into the frame context
wait.until(EC.frame_to_be_available_and_switch_to_it((By.NAME, "eregoptions")))

# click on the targeted element
wait.until(EC.element_to_be_clickable((By.ID, "btnviewdetails"))).click()

# come out of frame
driver.switch_to.default_content()
time.sleep(10)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.NAME, "showframe")))

# click on the targeted element
wait.until(EC.element_to_be_clickable((By.ID, "rdb_0"))).click()
time.sleep(10)

# click on audio button to reveal captcha in network calls
btn_xpath = "//img[contains(@title,'Captcha Audio')]"
audio_btn = driver.find_element(By.XPATH, btn_xpath)
if audio_btn:
    audio_btn.click()
    print("Clicked on audio button")
    time.sleep(10)

# come out of frame
driver.switch_to.default_content()

captcha = get_captcha_details_using_selemiumwire(driver)

if session_id:
    captcha = get_captcha_details_using_requests(session_id)
© www.soinside.com 2019 - 2024. All rights reserved.