我正在尝试使用 python 请求从网站 https://rss.knf.gov.pl/rss_pub/ 抓取表格数据。
从浏览器开发工具中,我可以在页面刷新时看到(以及从表下的限制下拉列表中进行选择时)正在发送一个 post 请求来检索此数据,该请求以 json 形式返回。我可以在开发工具的“响应”页面中看到。
但是,当我尝试模仿这个请求时,结果是一个空的json。我正在运行的代码是:
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36',
'Referer': 'https://rss.knf.gov.pl/rss_pub/'
}
payload = {
"cmd":"get",
"language":"en",
"search":[],
"limit":10,
"offset":0,
"method":"Default",
"sort":[{"field":"HOLDER_FULL_NAME","direction":"asc"}],
"searchLogic":"AND",
"searchValue":""
}
url = 'https://rss.knf.gov.pl/rss_pub/'
s = requests.session()
s.headers.update(headers)
s.get(url)
r = s.post(url+'JSON', json=payload)
运行此代码后,我只看到一个空的 json:
>>> r.status_code
200
>>> r.text
'{}'
我是否遗漏了请求中的任何内容导致了这个结果?
使用
data=
参数代替 json=
:
import requests
url = "https://rss.knf.gov.pl/rss_pub/JSON"
payload = {
"request": '{"cmd":"get","language":"pl","search":[],"limit":10000,"offset":0,"method":"Default","sort":[{"field":"HOLDER_FULL_NAME","direction":"asc"}],"searchLogic":"AND","searchValue":""}'
}
data = requests.post(url, data=payload).json()
print(data)
打印:
{
"total": 9,
"records": [
{
"HOLDER_FULL_NAME": "AQR Capital Management, LLC ",
"POSITION_DATE": "2023-09-08",
"ISSUER_NAME": "CDPROJEKT",
"MODIFY_DATE": "2023-09-12",
"ISIN": "PLOPTTC00011",
"recid": 1,
"NET_SHORT_POSITION_O": "0.5",
},
{
"HOLDER_FULL_NAME": "AQR Capital Management, LLC ",
"POSITION_DATE": "2023-10-23",
"ISSUER_NAME": "ALLEGRO",
"MODIFY_DATE": "2023-10-24",
"ISIN": "LU2237380790",
"recid": 2,
"NET_SHORT_POSITION_O": "0.5",
},
{
"HOLDER_FULL_NAME": "GSA Capital Partners LLP ",
"POSITION_DATE": "2023-07-07",
"ISSUER_NAME": "TSGAMES",
"MODIFY_DATE": "2023-08-21",
"ISIN": "PLTSQGM00016",
"recid": 3,
"NET_SHORT_POSITION_O": "0.6",
},
{
"HOLDER_FULL_NAME": "Insignis FIZ ",
"POSITION_DATE": "2023-01-13",
"ISSUER_NAME": "GPW",
"MODIFY_DATE": "2023-08-21",
"ISIN": "PLGPW0000017",
"recid": 4,
"NET_SHORT_POSITION_O": "0.59",
},
{
"HOLDER_FULL_NAME": "Marshall Wace LLP ",
"POSITION_DATE": "2023-10-24",
"ISSUER_NAME": "KETY",
"MODIFY_DATE": "2023-10-25",
"ISIN": "PLKETY000011",
"recid": 5,
"NET_SHORT_POSITION_O": "0.78",
},
{
"HOLDER_FULL_NAME": "Marshall Wace LLP ",
"POSITION_DATE": "2023-10-18",
"ISSUER_NAME": "CDPROJEKT",
"MODIFY_DATE": "2023-10-19",
"ISIN": "PLOPTTC00011",
"recid": 6,
"NET_SHORT_POSITION_O": "0.72",
},
{
"HOLDER_FULL_NAME": "Marshall Wace LLP ",
"POSITION_DATE": "2023-10-12",
"ISSUER_NAME": "JSW",
"MODIFY_DATE": "2023-10-13",
"ISIN": "PLJSW0000015",
"recid": 7,
"NET_SHORT_POSITION_O": "0.71",
},
{
"HOLDER_FULL_NAME": "PSquared Asset Management AG ",
"POSITION_DATE": "2023-10-17",
"ISSUER_NAME": "LPP",
"MODIFY_DATE": "2023-10-18",
"ISIN": "PLLPP0000011",
"recid": 8,
"NET_SHORT_POSITION_O": "0.5",
},
{
"HOLDER_FULL_NAME": "Silver Point Capital, L.P. ",
"POSITION_DATE": "2023-10-24",
"ISSUER_NAME": "CCC",
"MODIFY_DATE": "2023-10-25",
"ISIN": "PLCCC0000016",
"recid": 9,
"NET_SHORT_POSITION_O": "0.62",
},
],
"status": "success",
}