如何从Bayut（DLD验证属性）刮擦数据，而不会遇到401错误？我使用签证从Bailut刮擦房地产数据，但无法提取绿色壁虱（DLD验证信息）该信息是通过具有基本身份验证的帖子API获取的。在

Question

信息是通过基本身份验证的postapi获取的。将API与相同的标题，有效载荷和参数返回

401未授权

使用selenium有效，但对于大规模刮擦而言太慢（〜210k属性/周）。
I在networktab中找到了API请求并精确地复制了API请求，但仍会遇到401错误。网站可以使用其他安全措施（例如基于会话的身份验证或IP限制）吗？我尝试了什么：
Scrapy（无法获取验证信息）。 POSTMAN和PYTHON请求（401错误）。

如何有效地访问此数据？任何见解都将不胜感激。 跟踪请求帖子API的代码：

import requests
import base64

# Define the URL
url = "https://fenix-data-es2.bayut.com/_msearch"

# Encode credentials manually (decoded: "bayut_read_user_es2:10yNmg5+6K")
auth_string = "bayut_read_user_es2:10yNmg5+6K"
auth_encoded = base64.b64encode(auth_string.encode()).decode()  # Convert to Base64

# Headers with Authorization
headers = {
    "Authorization": f"Basic {auth_encoded}",
    "accept": "*/*",
    "accept-encoding": "gzip, deflate, br, zstd",
    "accept-language": "en-US,en;q=0.9",
    "cache-control": "no-cache",
    "content-type": "application/x-ndjson",
    "origin": "https://www.bayut.com",
    "pragma": "no-cache",
    "priority": "u=1, i",
    "referer": "https://www.bayut.com/",
    "sec-ch-ua": "\"Not(A:Brand\";v=\"99\", \"Google Chrome\";v=\"133\", \"Chromium\";v=\"133\"",
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": "\"Windows\"",
    "sec-fetch-dest": "empty",
    "sec-fetch-mode": "cors",
    "sec-fetch-site": "same-site",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"
}

# Query parameters (filter_path)
params = {
    "filter_path": "took,*.took,*.suggest.*.options.text,*.suggest.*.options._source.*,*.hits.total.*,*.hits.hits._source.*,*.hits.hits._score,*.hits.hits.highlight.*,*.error,*.aggregations.*.buckets.key,*.aggregations.*.buckets.doc_count,*.aggregations.*.buckets.complex_value.hits.hits._source,*.aggregations.*.filtered_agg.facet.buckets.key,*.aggregations.*.filtered_agg.facet.buckets.doc_count,*.aggregations.*.filtered_agg.facet.buckets.complex_value.hits.hits._source"
}

# POST data (formatted in NDJSON format)
post_data = """{"index":"dld_matched_property_details_prod_alias"}
{"from":0,"size":5,"track_total_hits":10000,"query":{"bool":{"must":[{"term":{"external_id":"10228377"}}]}}}
"""

# Sending the POST request
response = requests.post(url, headers=headers, params=params, data=post_data)

# Check if the request was successful
if response.status_code == 200:
    print("Request Successful!")
    print(response.json())  # Print the response in JSON format
else:
    print(f"Request failed with status code: {response.status_code}")
    print(response.text)  # Print the error message if any

你需要

hb-session-id

饼干；您可以从需要

/.humbucker/challenge/js/validate

标题和正确的帖子数据（指纹以特定顺序的指纹）的the post请求中获取它。

x-hb-co

如何从Bayut（DLD验证属性）刮擦数据，而不会遇到401错误？我使用签证从Bailut刮擦房地产数据，但无法提取绿色壁虱（DLD验证信息）该信息是通过具有基本身份验证的帖子API获取的。在

问题描述投票：0回答：0

最新问题

如何从Bayut（DLD验证属性）刮擦数据，而不会遇到401错误？ 我使用签证从Bailut刮擦房地产数据，但无法提取绿色壁虱（DLD验证信息） 该信息是通过具有基本身份验证的帖子API获取的。 在

问题描述 投票：0回答：0

最新问题

如何从Bayut（DLD验证属性）刮擦数据，而不会遇到401错误？我使用签证从Bailut刮擦房地产数据，但无法提取绿色壁虱（DLD验证信息）该信息是通过具有基本身份验证的帖子API获取的。在

问题描述投票：0回答：0