Python requests.get 返回空白结果

Question

我是网络抓取新手，试图从 redfin.com 抓取一些住房信息，我使用 python requests 包来获取网站代码。然而，代码有时会工作并返回每个 url 的完整 html，而有时它只返回空白。

这是我的代码的简化版本：

import requests

headers = {
        'user-agent': XXX
    }
links = ['https://www.redfin.com/ID/Meridian/3642-N-Hollymount-Way-83646/home/106711385',
         'https://www.redfin.com/ID/Meridian/1506-N-Penrith-Pl-83642/home/106700395',
         'https://www.redfin.com/ID/Nampa/34-N-Middleton-Rd-83651/home/117266789',
         'https://www.redfin.com/OR/The-Dalles/1308-Harris-St-97058/home/53055510']
for link in links:
    response = requests.get(link, headers = headers)
    html = response.text
print(html)

状态代码始终为 200，有时我可以获取 html，但大多数时候它只是返回空白。这真的让我很困惑，我非常感谢您帮助解决这个问题。谢谢！

Answer 1

以下代码（使用有效的用户代理）毫无例外地工作。

但是，由于速率限制，短时间内多次运行可能会导致 HTTP 429 Too Many Requests。

from requests import Session

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

links = [
    "https://www.redfin.com/ID/Meridian/3642-N-Hollymount-Way-83646/home/106711385",
    "https://www.redfin.com/ID/Meridian/1506-N-Penrith-Pl-83642/home/106700395",
    "https://www.redfin.com/ID/Nampa/34-N-Middleton-Rd-83651/home/117266789",
    "https://www.redfin.com/OR/The-Dalles/1308-Harris-St-97058/home/53055510",
]

with Session() as session:
    for link in links:
        try:
            with session.get(link, headers=HEADERS) as response:
                response.raise_for_status()
                print(response.text)
        except Exception as e:
            print(e)

Python requests.get 返回空白结果

问题描述投票：0回答：1

1个回答

最新问题

Python requests.get 返回空白结果

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1