如何更改python数组的编码？

Question

我使用以下代码从中文网站上刮取表格。它工作正常。但似乎我存储在列表中的内容没有正确显示。

import requests
from bs4 import BeautifulSoup
import pandas as pd

x = requests.get('http://www.sohu.com/a/79780904_126549')
bs = BeautifulSoup(x.text,'lxml')

clg_list = []

for tr in bs.find_all('tr'):
    tds = tr.find_all('td')
    for i in range(len(tds)):
       clg_list.append(tds[i].text)
       print(tds[i].text)

当我打印文本时，它显示中文字符。但是当我打印出列表时，它显示的是\ u4e00 \ u671f \ uff0834 \ u6240 \ uff09'。我不确定是否应该更改编码或其他错误。

Answer 1

这种情况没有错。

当你打印一个python列表时，python会在列表的每个元素上调用repr。在python2中，unicode字符串的repr显示组成字符串的字符的unicode代码点。

>>> c = clg_list[0]
>>> c # Ask the interpreter to display the repr of c
u'\u201c985\u201d\u5de5\u7a0b\u5927\u5b66\u540d\u5355\uff08\u622a\u6b62\u52302011\u5e743\u670831\u65e5\uff09'

但是，如果你print字符串，python使用文本编码（例如，utf-8）编码unicode字符串，并且您的计算机显示与编码匹配的字符。

>>> print c
“985”工程大学名单（截止到2011年3月31日）

请注意，在python3打印中，列表将显示您期望的中文字符，因为python3的unicode处理更好。

如何更改python数组的编码？

问题描述投票：1回答：1

1个回答

最新问题

如何更改python数组的编码？

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1