问题:当我尝试执行脚本时,BeautifulSoup(html, ...)
给出错误消息“TypeError:类型'对象的对象'没有len()。我尝试将实际的html作为参数传递,但它仍然无效。
import requests
url = 'http://vineoftheday.com/?order_by=rating'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, "html.parser")
尝试直接传递HTML文本
soup = BeautifulSoup(html.text)
如果你使用requests.get('https://example.com')
来获取HTML,你应该使用requests.get('https://example.com').text
。
你只得到'响应'中的响应代码并始终使用浏览器标题来保证安全,否则你将面临许多问题
在调试器控制台网络部分“header”UserAgent中查找标头
尝试
import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
url = 'http://www.google.com'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}
response = requests.get(quote_page, headers=headers).text
soup = BeautifulSoup(response, 'html.parser')
print(soup.prettify())
它对我有用:
soup = BeautifulSoup(requests.get("your_url").text)
现在,下面的代码更好(使用lxml解析器):
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(requests.get("your_url").text, 'lxml')