Python 请求 - 无法模仿 post 请求从网站抓取表数据

问题描述 投票:0回答:1

我正在尝试使用 python 请求从网站 https://rss.knf.gov.pl/rss_pub/ 抓取表格数据。

从浏览器开发工具中,我可以在页面刷新时看到(以及从表下的限制下拉列表中进行选择时)正在发送一个 post 请求来检索此数据,该请求以 json 形式返回。我可以在开发工具的“响应”页面中看到。

但是,当我尝试模仿这个请求时,结果是一个空的json。我正在运行的代码是:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36',
    'Referer': 'https://rss.knf.gov.pl/rss_pub/'
}

payload = {
    "cmd":"get",
    "language":"en",
    "search":[],
    "limit":10,
    "offset":0,
    "method":"Default",
    "sort":[{"field":"HOLDER_FULL_NAME","direction":"asc"}],
    "searchLogic":"AND",
    "searchValue":""
}

url = 'https://rss.knf.gov.pl/rss_pub/'
s = requests.session()
s.headers.update(headers)
s.get(url)
r = s.post(url+'JSON', json=payload)

运行此代码后,我只看到一个空的 json:

>>> r.status_code
200
>>> r.text
'{}'

我是否遗漏了请求中的任何内容导致了这个结果?

python web-scraping post python-requests
1个回答
0
投票

使用

data=
参数代替
json=
:

import requests

url = "https://rss.knf.gov.pl/rss_pub/JSON"

payload = {
    "request": '{"cmd":"get","language":"pl","search":[],"limit":10000,"offset":0,"method":"Default","sort":[{"field":"HOLDER_FULL_NAME","direction":"asc"}],"searchLogic":"AND","searchValue":""}'
}

data = requests.post(url, data=payload).json()
print(data)

打印:

{
    "total": 9,
    "records": [
        {
            "HOLDER_FULL_NAME": "AQR Capital Management, LLC ",
            "POSITION_DATE": "2023-09-08",
            "ISSUER_NAME": "CDPROJEKT",
            "MODIFY_DATE": "2023-09-12",
            "ISIN": "PLOPTTC00011",
            "recid": 1,
            "NET_SHORT_POSITION_O": "0.5",
        },
        {
            "HOLDER_FULL_NAME": "AQR Capital Management, LLC ",
            "POSITION_DATE": "2023-10-23",
            "ISSUER_NAME": "ALLEGRO",
            "MODIFY_DATE": "2023-10-24",
            "ISIN": "LU2237380790",
            "recid": 2,
            "NET_SHORT_POSITION_O": "0.5",
        },
        {
            "HOLDER_FULL_NAME": "GSA Capital Partners LLP ",
            "POSITION_DATE": "2023-07-07",
            "ISSUER_NAME": "TSGAMES",
            "MODIFY_DATE": "2023-08-21",
            "ISIN": "PLTSQGM00016",
            "recid": 3,
            "NET_SHORT_POSITION_O": "0.6",
        },
        {
            "HOLDER_FULL_NAME": "Insignis FIZ ",
            "POSITION_DATE": "2023-01-13",
            "ISSUER_NAME": "GPW",
            "MODIFY_DATE": "2023-08-21",
            "ISIN": "PLGPW0000017",
            "recid": 4,
            "NET_SHORT_POSITION_O": "0.59",
        },
        {
            "HOLDER_FULL_NAME": "Marshall Wace LLP ",
            "POSITION_DATE": "2023-10-24",
            "ISSUER_NAME": "KETY",
            "MODIFY_DATE": "2023-10-25",
            "ISIN": "PLKETY000011",
            "recid": 5,
            "NET_SHORT_POSITION_O": "0.78",
        },
        {
            "HOLDER_FULL_NAME": "Marshall Wace LLP ",
            "POSITION_DATE": "2023-10-18",
            "ISSUER_NAME": "CDPROJEKT",
            "MODIFY_DATE": "2023-10-19",
            "ISIN": "PLOPTTC00011",
            "recid": 6,
            "NET_SHORT_POSITION_O": "0.72",
        },
        {
            "HOLDER_FULL_NAME": "Marshall Wace LLP ",
            "POSITION_DATE": "2023-10-12",
            "ISSUER_NAME": "JSW",
            "MODIFY_DATE": "2023-10-13",
            "ISIN": "PLJSW0000015",
            "recid": 7,
            "NET_SHORT_POSITION_O": "0.71",
        },
        {
            "HOLDER_FULL_NAME": "PSquared Asset Management AG ",
            "POSITION_DATE": "2023-10-17",
            "ISSUER_NAME": "LPP",
            "MODIFY_DATE": "2023-10-18",
            "ISIN": "PLLPP0000011",
            "recid": 8,
            "NET_SHORT_POSITION_O": "0.5",
        },
        {
            "HOLDER_FULL_NAME": "Silver Point Capital, L.P. ",
            "POSITION_DATE": "2023-10-24",
            "ISSUER_NAME": "CCC",
            "MODIFY_DATE": "2023-10-25",
            "ISIN": "PLCCC0000016",
            "recid": 9,
            "NET_SHORT_POSITION_O": "0.62",
        },
    ],
    "status": "success",
}
© www.soinside.com 2019 - 2024. All rights reserved.