备用请求不起作用用于刮口网站

问题描述 投票:0回答:1

步长1:发送get请求以获取页面并提取隐藏的表单值 我们首先将GET请求发送到页面,并提取必要的隐藏表单值,例如__viewState,__viewStategenerator和__EventValidation,这是后续邮政请求所需的。

import requests from bs4 import BeautifulSoup import re # Define the URL and headers for the initial GET request url = "https://jamabandi.punjab.gov.in/default.aspx" headers = { 'Accept': '*/*', 'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8', 'Cache-Control': 'no-cache', 'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8', 'Origin': 'https://jamabandi.punjab.gov.in', 'Referer': 'https://jamabandi.punjab.gov.in/default.aspx', 'Sec-Fetch-Dest': 'empty', 'Sec-Fetch-Mode': 'cors', 'Sec-Fetch-Site': 'same-origin', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36', 'X-Requested-With': 'XMLHttpRequest' } # Send GET request to fetch the page response = requests.get(url, headers=headers) # Extract hidden form fields (VIEWSTATE, VIEWSTATEGENERATOR, EVENTVALIDATION) viewstate = re.search(r'__VIEWSTATE" value="([^"]+)', response.text).group(1) viewstategenerator = re.search(r'__VIEWSTATEGENERATOR" value="([^"]+)', response.text).group(1) eventvalidation = re.search(r'__EVENTVALIDATION" value="([^"]+)', response.text).group(1) # Print the extracted values to verify print("VIEWSTATE:", viewstate) print("VIEWSTATEGENERATOR:", viewstategenerator) print("EVENTVALIDATION:", eventvalidation)
步长2:发布请求以选择该地区
现在我们拥有必要的隐藏表格值,我们可以通过发送邮政请求来发布区选择。

Define the payload for the POST request to select a district payload = { 'ctl00$ScriptManager1': 'ctl00$ContentPlaceHolder1$updRegionSelector|ctl00$ContentPlaceHolder1$ddlDistrict', 'ctl00$SelectRegion$rdPeriod': '1', 'ctl00$SelectRegion$ddlDistrict': '10', # Choose district here (example: 10) 'ctl00$SelectRegion$ddlTehsil': '', 'ctl00$SelectRegion$ddlVillage': '', 'ctl00$SelectRegion$ddlYear': '', 'ctl00$ddlLang': 'en-US', 'ctl00$ContentPlaceHolder1$rdPeriod': '1', 'ctl00$ContentPlaceHolder1$ddlDistrict': '10', # Same district '__EVENTTARGET': 'ctl00$ContentPlaceHolder1$ddlDistrict', '__EVENTARGUMENT': '', '__LASTFOCUS': '', '__VIEWSTATE': viewstate, '__VIEWSTATEGENERATOR': viewstategenerator, '__EVENTVALIDATION': eventvalidation, '__ASYNCPOST': 'true' } # Send the POST request to select the district post_url = "https://jamabandi.punjab.gov.in/default.aspx" response = requests.post(post_url, headers=headers, data=payload) # Print response to check if the request was successful print(response.text)

步骤3:选择tehsil
接下来,我们遵循与地区选择相同的方法选择TEHSIL。我们需要从更新的页面(遵循区域选择的页面)再次提取隐藏的表单值,然后发布TEHSIL选择。
# Extract hidden form fields after the district selection (VIEWSTATE, VIEWSTATEGENERATOR, EVENTVALIDATION) viewstate = re.search(r'__VIEWSTATE" value="([^"]+)', response.text).group(1) viewstategenerator = re.search(r'__VIEWSTATEGENERATOR" value="([^"]+)', response.text).group(1) eventvalidation = re.search(r'__EVENTVALIDATION" value="([^"]+)', response.text).group(1) # Define the payload for the POST request to select the Tehsil payload = { 'ctl00$ScriptManager1': 'ctl00$ContentPlaceHolder1$updRegionSelector|ctl00$ContentPlaceHolder1$ddlTehsil', 'ctl00$SelectRegion$rdPeriod': '1', 'ctl00$SelectRegion$ddlDistrict': '10', 'ctl00$SelectRegion$ddlTehsil': '63', # Choose Tehsil here (example: 63) 'ctl00$SelectRegion$ddlVillage': '', 'ctl00$SelectRegion$ddlYear': '', 'ctl00$ddlLang': 'en-US', 'ctl00$ContentPlaceHolder1$rdPeriod': '1', 'ctl00$ContentPlaceHolder1$ddlDistrict': '10', 'ctl00$ContentPlaceHolder1$ddlTehsil': '63', '__EVENTTARGET': 'ctl00$ContentPlaceHolder1$ddlTehsil', '__EVENTARGUMENT': '', '__LASTFOCUS': '', '__VIEWSTATE': viewstate, '__VIEWSTATEGENERATOR': viewstategenerator, '__EVENTVALIDATION': eventvalidation, '__ASYNCPOST': 'true' } # Send the POST request to select the Tehsil response = requests.post(post_url, headers=headers, data=payload) # Print response to check if the request was successful print(response.text)

步骤4:选择村庄 我们遵循与以前相同的方法,但是现在我们需要选择村庄。选择TEHSIL后,应再次提取隐藏的形式值。
# Extract hidden form fields after the Tehsil selection (VIEWSTATE, VIEWSTATEGENERATOR, EVENTVALIDATION)
viewstate = re.search(r'__VIEWSTATE" value="([^"]+)', response.text).group(1)
viewstategenerator = re.search(r'__VIEWSTATEGENERATOR" value="([^"]+)', response.text).group(1)
eventvalidation = re.search(r'__EVENTVALIDATION" value="([^"]+)', response.text).group(1)

# Define the payload for the POST request to select the Village
payload = {
    'ctl00$ScriptManager1': 'ctl00$ContentPlaceHolder1$updRegionSelector|ctl00$ContentPlaceHolder1$ddlVillage',
    'ctl00$SelectRegion$rdPeriod': '1',
    'ctl00$SelectRegion$ddlDistrict': '10',
    'ctl00$SelectRegion$ddlTehsil': '63',
    'ctl00$SelectRegion$ddlVillage': '1E36A812-C218-DD11-8334-000E0CA49FC8',  # Choose Village here (example village ID)
    'ctl00$SelectRegion$ddlYear': '',
    'ctl00$ddlLang': 'en-US',
    'ctl00$ContentPlaceHolder1$rdPeriod': '1',
    'ctl00$ContentPlaceHolder1$ddlDistrict': '10',
    'ctl00$ContentPlaceHolder1$ddlTehsil': '63',
    'ctl00$ContentPlaceHolder1$ddlVillage': '1E36A812-C218-DD11-8334-000E0CA49FC8',
    '__EVENTTARGET': 'ctl00$ContentPlaceHolder1$ddlVillage',
    '__EVENTARGUMENT': '',
    '__LASTFOCUS': '',
    '__VIEWSTATE': viewstate,
    '__VIEWSTATEGENERATOR': viewstategenerator,
    '__EVENTVALIDATION': eventvalidation,
    '__ASYNCPOST': 'true'
}

# Send the POST request to select the Village
response = requests.post(post_url, headers=headers, data=payload)

# Print response to check if the request was successful
print(response.text)

步骤5:选择年份 现在,我们以相同的方式移动选择这一年。

# Extract hidden form fields after the Village selection (VIEWSTATE, VIEWSTATEGENERATOR, EVENTVALIDATION) viewstate = re.search(r'__VIEWSTATE" value="([^"]+)', response.text).group(1) viewstategenerator = re.search(r'__VIEWSTATEGENERATOR" value="([^"]+)', response.text).group(1) eventvalidation = re.search(r'__EVENTVALIDATION" value="([^"]+)', response.text).group(1) # Define the payload for the POST request to select the Year payload = { 'ctl00$ScriptManager1': 'ctl00$ContentPlaceHolder1$updRegionSelector|ctl00$ContentPlaceHolder1$ddlYear', 'ctl00$SelectRegion$rdPeriod': '1', 'ctl00$SelectRegion$ddlDistrict': '10', 'ctl00$SelectRegion$ddlTehsil': '63', 'ctl00$SelectRegion$ddlVillage': '1E36A812-C218-DD11-8334-000E0CA49FC8', 'ctl00$SelectRegion$ddlYear': '3', # Choose Year here (example: 3) 'ctl00$ddlLang': 'en-US', 'ctl00$ContentPlaceHolder1$rdPeriod': '1', 'ctl00$ContentPlaceHolder1$ddlDistrict': '10', 'ctl00$ContentPlaceHolder1$ddlTehsil': '63', 'ctl00$ContentPlaceHolder1$ddlVillage': '1E36A812-C218-DD11-8334-000E0CA49FC8', 'ctl00$ContentPlaceHolder1$ddlYear': '3', '__EVENTTARGET': 'ctl00$ContentPlaceHolder1$ddlYear', '__EVENTARGUMENT': '', '__LASTFOCUS': '', '__VIEWSTATE': viewstate, '__VIEWSTATEGENERATOR': viewstategenerator, '__EVENTVALIDATION': eventvalidation, '__ASYNCPOST': 'true' } # Send the POST request to select the Year response = requests.post(post_url, headers=headers, data=payload) # Print response to check if the request was successful print(response.text)
步骤6:将获取请求发送到突变页面

代码的第一部分将get请求发送到突变页面以检索必要的信息,例如cookie和隐藏字段,例如__viewState,__viewStategenerator和__EventValidation.
import requests
cookie1 = response.cookies.get_dict()
cookie2 = cookie1['dgrLAndrecordPLRS']

# Define the URL
url = "https://jamabandi.punjab.gov.in/Mutation.aspx?itemPID=4"

# Define the headers
headers = {
    "authority": "jamabandi.punjab.gov.in",
    "method": "GET",
    "scheme": "https",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "accept-encoding": "gzip, deflate, br, zstd",
    "accept-language": "en-GB,en;q=0.9",
    "referer": "https://jamabandi.punjab.gov.in/default.aspx",
    "sec-ch-ua": '"Not A(Brand";v="8", "Chromium";v="132", "Google Chrome";v="132"',
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": '"macOS"',
    "sec-fetch-dest": "document",
    "sec-fetch-mode": "navigate",
    "sec-fetch-site": "same-origin",
    "sec-fetch-user": "?1",
    "upgrade-insecure-requests": "1",
    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36"
}

# Define the cookies
cookies = {
    "dgrLAndrecordPLRS": cookie2
}

# Make the GET request
response = requests.get(url, headers=headers, cookies=cookies)

# Check the response status
if response.status_code == 200:
    print("Request was successful!")
    print(response.text)  # Print the HTML content
else:
    print(f"Failed to fetch the page. Status Code: {response.status_code}")
Step 2: Extract Hidden Fields from the HTML

成功发送GET请求后,您使用美丽的套件来解析HTML响应并提取必要的隐藏字段(__viewState,_______viewStategenerator和__EventValidation),在后续的帖子请求中需要。

from bs4 import BeautifulSoup # Initialize a session to persist cookies session = requests.Session() # Step 1: Send a GET request to the page to get necessary hidden fields url = "https://jamabandi.punjab.gov.in/Mutation.aspx?itemPID=4" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36" } headers.update({ "Referer": url, "X-Requested-With": "XMLHttpRequest" }) # Send GET request to fetch the page content response = session.get(url, headers=headers) # Parse HTML to extract __VIEWSTATE, __VIEWSTATEGENERATOR, and __EVENTVALIDATION soup = BeautifulSoup(response.text, "html.parser") viewstate = soup.find("input", {"name": "__VIEWSTATE"})["value"] viewstategenerator = soup.find("input", {"name": "__VIEWSTATEGENERATOR"})["value"] eventvalidation = soup.find("input", {"name": "__EVENTVALIDATION"})["value"] # Debug: Print extracted values print("VIEWSTATE:", viewstate) print("VIEWSTATEGENERATOR:", viewstategenerator) print("EVENTVALIDATION:", eventvalidation) Step 3: Prepare Payload for POST Request Next, you define the payload for the POST request. This includes the hidden fields you extracted (__VIEWSTATE, __VIEWSTATEGENERATOR, and __EVENTVALIDATION), as well as any form fields required by the mutation form. # Step 8: Prepare the payload for the POST request payload = { "ctl00$ScriptManager1": "ctl00$ContentPlaceHolder1$updMutationPanel|ctl00$ContentPlaceHolder1$ddlMutationNumber", "__EVENTTARGET": "ctl00$ContentPlaceHolder1$ddlMutationNumber", "__EVENTARGUMENT": "", "__LASTFOCUS": "", "__VIEWSTATE": viewstate, "__VIEWSTATEGENERATOR": viewstategenerator, "__SCROLLPOSITIONX": "0", "__SCROLLPOSITIONY": "0", "__EVENTVALIDATION": eventvalidation, "ctl00$SelectRegion$rdPeriod": "1", "ctl00$SelectRegion$ddlDistrict": "", "ctl00$SelectRegion$ddlTehsil": "", "ctl00$SelectRegion$ddlVillage": "", "ctl00$SelectRegion$ddlYear": "", "ctl00$ddlLang": "en-US", "ctl00$ContentPlaceHolder1$ddlMutationNumber": "5672", "__ASYNCPOST": "true" }

步骤4:发送邮政请求以选择突变

准备有效载荷后,您发送邮政请求以选择突变号。如果选择突变,此步骤可能会触发验证码处理。

# Step 3: Send the POST request post_url = "https://jamabandi.punjab.gov.in/Mutation.aspx?itemPID=4" response = session.post(post_url, headers=headers, data=payload) # Print response to verify print(response.text)
    

我做了一些研究,我设法复制了到达CAPTCHA页面的整个请求链。我的脚本完全复制了您在浏览器中要做的事情。关键是您需要将一个会话用于所有请求。提出第一个请求后 - 将设置会话cookie,将在所有后续请求中使用。没有他们,什么都不会!

the是您可以运行和测试的脚本。它将使用CAPTCHA输入加载响应。

import re import requests from bs4 import BeautifulSoup def parse_dynamic_form_data_from_get_response(response) -> dict[str, str]: soup = BeautifulSoup(response.text, 'html.parser') input_names = ( '__VIEWSTATE', '__VIEWSTATEGENERATOR', '__EVENTVALIDATION', ) dynamic_form_data = {} for input_name in input_names: value = soup.find('input', attrs={'name': input_name})['value'] dynamic_form_data[input_name] = value return dynamic_form_data def parse_dynamic_form_data_from_post_response(response) -> dict[str, str]: text = getattr(response, 'text', response) keys = ( '__VIEWSTATE', '__VIEWSTATEGENERATOR', '__EVENTVALIDATION', ) dynamic_form_data = {} for key in keys: value = re.search(rf'{key}\|(.+?)\|', text).group(1) dynamic_form_data[key] = value return dynamic_form_data class Scraper: BASE_URL = 'https://jamabandi.punjab.gov.in' BASE_HEADERS = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:137.0) Gecko/20100101 Firefox/137.0', } def __init__(self) -> None: self.session = requests.Session() def load_init_page(self): url = f'{self.BASE_URL}/default.aspx' return self.session.get(url=url) def select_district(self, dynamic_form_data): form_data = { 'ctl00$ScriptManager1': 'ctl00$ContentPlaceHolder1$updRegionSelector|ctl00$ContentPlaceHolder1$ddlDistrict', 'ctl00$SelectRegion$rdPeriod': '1', 'ctl00$SelectRegion$ddlDistrict': '', 'ctl00$SelectRegion$ddlTehsil': '', 'ctl00$SelectRegion$ddlVillage': '', 'ctl00$SelectRegion$ddlYear': '', 'ctl00$ddlLang': 'en-US', 'ctl00$ContentPlaceHolder1$rdPeriod': '1', 'ctl00$ContentPlaceHolder1$ddlDistrict': '10', '__EVENTTARGET': 'ctl00$ContentPlaceHolder1$ddlDistrict', '__EVENTARGUMENT': '', '__LASTFOCUS': '', '__ASYNCPOST': 'true', **dynamic_form_data, } url = f'{self.BASE_URL}/default.aspx' headers = self.BASE_HEADERS | { 'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8', } return self.session.post(url=url, data=form_data, headers=headers) def select_tehsil(self, dynamic_form_data): form_data = { 'ctl00$ScriptManager1': 'ctl00$ContentPlaceHolder1$updRegionSelector|ctl00$ContentPlaceHolder1$ddlTehsil', 'ctl00$SelectRegion$rdPeriod': '1', 'ctl00$SelectRegion$ddlDistrict': '', 'ctl00$SelectRegion$ddlTehsil': '', 'ctl00$SelectRegion$ddlVillage': '', 'ctl00$SelectRegion$ddlYear': '', 'ctl00$ddlLang': 'en-US', 'ctl00$ContentPlaceHolder1$rdPeriod': '1', 'ctl00$ContentPlaceHolder1$ddlDistrict': '10', 'ctl00$ContentPlaceHolder1$ddlTehsil': '62', '__EVENTTARGET': 'ctl00$ContentPlaceHolder1$ddlTehsil', '__EVENTARGUMENT': '', '__LASTFOCUS': '', '__ASYNCPOST': 'true', **dynamic_form_data, } url = f'{self.BASE_URL}/default.aspx' headers = self.BASE_HEADERS | { 'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8', } return self.session.post(url=url, data=form_data, headers=headers) def select_village(self, dynamic_form_data): form_data = { 'ctl00$ScriptManager1': 'ctl00$ContentPlaceHolder1$updRegionSelector|ctl00$ContentPlaceHolder1$ddlVillage', 'ctl00$SelectRegion$rdPeriod': '1', 'ctl00$SelectRegion$ddlDistrict': '', 'ctl00$SelectRegion$ddlTehsil': '', 'ctl00$SelectRegion$ddlVillage': '', 'ctl00$SelectRegion$ddlYear': '', 'ctl00$ddlLang': 'en-US', 'ctl00$ContentPlaceHolder1$rdPeriod': '1', 'ctl00$ContentPlaceHolder1$ddlDistrict': '10', 'ctl00$ContentPlaceHolder1$ddlTehsil': '62', 'ctl00$ContentPlaceHolder1$ddlVillage': 'E645A091-F0D4-DF11-8826-001E0B4643B6', '__EVENTTARGET': 'ctl00$ContentPlaceHolder1$ddlVillage', '__EVENTARGUMENT': '', '__LASTFOCUS': '', '__ASYNCPOST': 'true', **dynamic_form_data, } url = f'{self.BASE_URL}/default.aspx' headers = self.BASE_HEADERS | { 'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8', } return self.session.post(url=url, data=form_data, headers=headers) def select_year(self, dynamic_form_data): form_data ={ 'ctl00$ScriptManager1': 'ctl00$ContentPlaceHolder1$updRegionSelector|ctl00$ContentPlaceHolder1$ddlYear', 'ctl00$SelectRegion$rdPeriod': '1', 'ctl00$SelectRegion$ddlDistrict': '', 'ctl00$SelectRegion$ddlTehsil': '', 'ctl00$SelectRegion$ddlVillage': '', 'ctl00$SelectRegion$ddlYear': '', 'ctl00$ddlLang': 'en-US', 'ctl00$ContentPlaceHolder1$rdPeriod': '1', 'ctl00$ContentPlaceHolder1$ddlDistrict': '10', 'ctl00$ContentPlaceHolder1$ddlTehsil': '62', 'ctl00$ContentPlaceHolder1$ddlVillage': 'E645A091-F0D4-DF11-8826-001E0B4643B6', 'ctl00$ContentPlaceHolder1$ddlYear': '4', '__EVENTTARGET': 'ctl00$ContentPlaceHolder1$ddlYear', '__EVENTARGUMENT': '', '__LASTFOCUS': '', '__ASYNCPOST': 'true', **dynamic_form_data, } url = f'{self.BASE_URL}/default.aspx' headers = self.BASE_HEADERS | { 'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8', } return self.session.post(url=url, data=form_data, headers=headers) def set_region(self, dynamic_form_data): form_data = { 'ctl00$ScriptManager1': 'ctl00$ContentPlaceHolder1$updRegionSelector|ctl00$ContentPlaceHolder1$btnSearch', 'ctl00$SelectRegion$rdPeriod': '1', 'ctl00$SelectRegion$ddlDistrict': '', 'ctl00$SelectRegion$ddlTehsil': '', 'ctl00$SelectRegion$ddlVillage': '', 'ctl00$SelectRegion$ddlYear': '', 'ctl00$ddlLang': 'en-US', 'ctl00$ContentPlaceHolder1$rdPeriod': '1', 'ctl00$ContentPlaceHolder1$ddlDistrict': '10', 'ctl00$ContentPlaceHolder1$ddlTehsil': '62', 'ctl00$ContentPlaceHolder1$ddlVillage': 'E645A091-F0D4-DF11-8826-001E0B4643B6', 'ctl00$ContentPlaceHolder1$ddlYear': '4', '__EVENTTARGET': '', '__EVENTARGUMENT': '', '__LASTFOCUS': '', '__ASYNCPOST': 'true', 'ctl00$ContentPlaceHolder1$btnSearch': 'Set Region', **dynamic_form_data, } url = f'{self.BASE_URL}/default.aspx' headers = self.BASE_HEADERS | { 'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8', } return self.session.post(url=url, data=form_data, headers=headers) def load_mutation_page(self): url = f'{self.BASE_URL}/Mutation.aspx?itemPID=4' return self.session.get(url=url, headers=self.BASE_HEADERS) def select_mutation(self, dynamic_form_data): form_data = { 'ctl00$ScriptManager1': 'ctl00$ContentPlaceHolder1$updRegionSelector|ctl00$ContentPlaceHolder1$ddlVillage', 'ctl00$SelectRegion$rdPeriod': '1', 'ctl00$SelectRegion$ddlDistrict': '', 'ctl00$SelectRegion$ddlTehsil': '', 'ctl00$SelectRegion$ddlVillage': '', 'ctl00$SelectRegion$ddlYear': '', 'ctl00$ddlLang': 'en-US', 'ctl00$ContentPlaceHolder1$ddlMutationNumber': '10429', '__EVENTTARGET': 'ctl00$ContentPlaceHolder1$ddlMutationNumber', '__EVENTARGUMENT': '', '__LASTFOCUS': '', '__ASYNCPOST': 'true', **dynamic_form_data, } url = f'{self.BASE_URL}/Mutation.aspx?itemPID=4' headers = self.BASE_HEADERS | { 'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8', } return self.session.post(url=url, data=form_data, headers=headers) def load_captcha_response(self): handlers = ( self.select_district, self.select_tehsil, self.select_village, self.select_year, self.set_region, ) response = self.load_init_page() data = parse_dynamic_form_data_from_get_response(response=response) for handler in handlers: response = handler(dynamic_form_data=data) data = parse_dynamic_form_data_from_post_response(response) response = self.load_mutation_page() data = parse_dynamic_form_data_from_get_response(response=response) response = self.select_mutation(dynamic_form_data=data) return response if __name__ == '__main__': scraper = Scraper() result = scraper.load_captcha_response() print(result.text)

您可以看到 - 一个会话用于所有cookie设置的请求。我试图命名每种方法,以便您可以理解那里到底发生了什么。如果您需要对代码的其他说明 - 写有关该代码,我将扩展我的答案。
where是响应的部分,您可以看到突变编号已成功选择并加载了验证码:
python post get
1个回答
0
投票

最新问题
© www.soinside.com 2019 - 2025. All rights reserved.