http.client 可以工作，但请求会引发读取超时

Question

我只是想了解一下。使用

requests

时，请求会抛出 403（没有标头时）或读取超时（当有标头时）。使用

http.client

执行同样的操作会得到 200 状态代码作为响应。

我试图获取的网址是：https://img.uefa.com/imgml/uefacom/uel/social/og-default.jpg

失败的代码：

import requests

url = 'https://img.uefa.com/imgml/uefacom/uel/social/og-default.jpg'

try:
    response = requests.get(url, verify=False, timeout=10)  # Disable SSL verification
    response.raise_for_status()
except requests.exceptions.RequestException as e:
    print("Error:", e)

有效的代码：

import http.client
import ssl

conn = http.client.HTTPSConnection("img.uefa.com", context=ssl._create_unverified_context())
conn.request("GET", "/imgml/uefacom/uel/social/og-default.jpg")
response = conn.getresponse()
print(response.status, response.reason)
conn.close()

我尝试了很多方法，添加了多个标题，但没有任何效果。

curl 中的以下命令也有效

curl -v "https://img.uefa.com/imgml/uefacom/uel/social/og-default.jpg" --output image.jpg

也可以在浏览器中打开。

注意：所有请求均在本地完成

requests

是否执行任何可能影响此问题的步骤？

Answer 1

某些站点将拒绝来自具有“无效”用户代理字符串的客户端的流量。

如果打印请求 Python 库使用的默认标头对象，您可以看到它非常明确地指出该请求来自 Python 脚本：

{'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

网站所有者可能希望限制机器人和网页抓取，因此不接受此用户代理。 httpx 库的用户代理很可能没有被过滤掉。

下面带有类似浏览器的用户代理的代码可以正常工作，正如您在上面指定的那样。

import requests

url = 'https://img.uefa.com/imgml/uefacom/uel/social/og-default.jpg'

headers = {
    'User-Agent': (
        'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15' # noqa
    )
}

try:
    response = requests.get(
        url,
        headers=headers,
        verify=True,     # Disable SSL verification (if needed)
        timeout=10,       # Timeout after 10 seconds
    )
    print(response.status_code)
except Exception as e:
    print("Error:", e)

http.client 可以工作，但请求会引发读取超时

问题描述投票：0回答：1

1个回答

最新问题

http.client 可以工作，但请求会引发读取超时

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1