在 Python 中解析 HTTP 响应

Question

我想操作 THIS 网址上的信息。我可以成功打开它并阅读其内容。但我真正想做的是扔掉所有我不想要的东西，并操纵我想要保留的东西。

有没有办法将字符串转换为字典以便我可以迭代它？或者我只需按原样解析它（str 类型）？

from urllib.request import urlopen

url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'
response = urlopen(url)

print(response.read()) # returns string with info

Answer 1

当我打印

response.read()

时，我注意到

被预先添加到字符串中（例如

b'{"a":1,..

）。 “b”代表字节，用作您正在处理的对象类型的声明。因为，我知道可以使用

json.loads('string')

将字符串转换为字典，所以我只需将字节类型转换为字符串类型即可。我通过解码对 utf-8 的响应来做到这一点

decode('utf-8')

。一旦它是字符串类型，我的问题就解决了，我可以轻松地迭代

dict

。

我不知道这是否是最快或最“Pythonic”的编写方式，但它有效，而且以后总是有时间进行优化和改进！我的解决方案的完整代码：

from urllib.request import urlopen
import json

# Get the dataset
url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'
response = urlopen(url)

# Convert bytes to string type and string type to dict
string = response.read().decode('utf-8')
json_obj = json.loads(string)

print(json_obj['source_name']) # prints the string with 'source_name' key

Answer 2

你也可以使用Python的requests库来代替。

import requests

url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'    
response = requests.get(url)    
dict = response.json()

现在你可以像Python字典一样操作“dict”。

Answer 3

json

在 Python 3 中使用 Unicode 文本（JSON 格式本身仅根据 Unicode 文本定义），因此您需要解码 HTTP 响应中收到的字节。

r.headers.get_content_charset('utf-8')

获取您的字符编码：

#!/usr/bin/env python3
import io
import json
from urllib.request import urlopen

with urlopen('https://httpbin.org/get') as r, \
     io.TextIOWrapper(r, encoding=r.headers.get_content_charset('utf-8')) as file:
    result = json.load(file)
print(result['headers']['User-Agent'])

这里没有必要使用

io.TextIOWrapper

：

#!/usr/bin/env python3
import json
from urllib.request import urlopen

with urlopen('https://httpbin.org/get') as r:
    result = json.loads(r.read().decode(r.headers.get_content_charset('utf-8')))
print(result['headers']['User-Agent'])

Answer 4

TL&DR：当您通常从服务器获取数据时，数据以字节为单位发送。基本原理是这些字节需要由接收者“解码”，接收者应该知道如何使用数据。您应该在二进制文件到达时对其进行解码，以获得的不是“b”（字节），而是一个字符串。

用例：

import requests    
def get_data_from_url(url):
        response = requests.get(url_to_visit)
        response_data_split_by_line = response.content.decode('utf-8').splitlines()
        return response_data_split_by_line

在此示例中，我将收到的内容解码为 UTF-8。出于我的目的，我然后将其按行拆分，这样我就可以使用 for 循环遍历每一行。

Answer 5

不要调用（我）json...而是在

dict

实例的 headers

 属性上调用

http.client.HTTPResponse 函数，实现为基于

http.client.HTTPMessage 的

email.message.Message

。

#!/usr/bin/env python3
import urllib.request


url = 'address'
data = b'key: values',

with urllib.request.urlopen(url, data=data) as rs:
    headers = dict(rs.headers))
    html = rs.read() # binary form
    
print(headers)
{'Date': 'xyz', 'Server': 'Apache', 'Vary': 'Host,Accept-Encoding', 'Upgrade': 'h2', 'Connection': 'Upgrade, close', 'Accept-Ranges': 'bytes', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'sameorigin', 'Referrer-Policy': 'same-origin', 'strict-transport-security': 'max-age=300', 'Content-Length': '2885', 'Content-Type': 'text/html'}

根据需要，整个响应可以组织为一个字典，例如

response = {"headers": headers, "body": html}

Answer 6

我想 python 3.4 中的情况已经发生了变化。这对我有用：

print("resp:" + json.dumps(resp.json()))

在 Python 中解析 HTTP 响应

问题描述投票：0回答：6

6个回答

最新问题

在 Python 中解析 HTTP 响应

问题描述 投票：0回答：6

6个回答

最新问题

问题描述投票：0回答：6