这是使用BeautifulSoup库从github存储库中抓取内容的python代码。我面临错误:
“NoneType”对象没有属性'text'“
在这个简单的代码中。我在2行中面临错误,在代码中注释。
import requests
from bs4 import BeautifulSoup
import csv
URL = "https://github.com/DURGESHBARWAL?tab=repositories"
r = requests.get(URL)
soup = BeautifulSoup(r.text, 'html.parser')
repos = []
table = soup.find('ul', attrs = {'data-filterable-for':'your-repos-filter'})
for row in table.find_all('li', attrs = {'itemprop':'owns'}):
repo = {}
repo['name'] = row.find('div').find('h3').a.text
#First Error Position
repo['desc'] = row.find('div').p.text
#Second Error Postion
repo['lang'] = row.find('div', attrs = {'class':'f6 text-gray mt-2'}).find('span', attrs = {'class':'mr-3'}).text
repos.append(repo)
filename = 'extract.csv'
with open(filename, 'w') as f:
w = csv.DictWriter(f,['name','desc','lang'])
w.writeheader()
for repo in repos:
w.writerow(repo)
OUTPUT
回溯(最近一次调用最后一次):文件“webscrapping.py”,第16行,在repo ['desc'] = row.find('div')。p.text AttributeError:'NoneType'对象没有属性'text'
发生这种情况的原因是当你通过BeautifulSoup找到元素时,它就像是一个dict.get()
调用。当你去find
元素时,它从元素树中get
s。如果它找不到一个,而不是提高Exception
,它返回None
。 None
不具备Element
将拥有的属性,如text
,attr
等。因此,当您在没有Element.text
或没有验证类型的情况下进行try/except
调用时,您正在进行赌博,该元素将始终存在。
我可能只是首先保留在temp变量中给你问题的元素,这样你就可以键入check。无论是那个或实施try/except
for row in table.find_all('li', attrs = {'itemprop':'owns'}):
repo = {}
repo['name'] = row.find('div').find('h3').a.text
p = row.find('div').p
if p is not None:
repo['desc'] = p.text
else:
repo['desc'] = None
lang = row.find('div', attrs = {'class':'f6 text-gray mt-2'}).find('span', attrs = {'class':'mr-3'})
if lang is not None
# Do something to pass here
repo['lang'] = lang.text
else:
repo['lang'] = None
repos.append(repo)
for row in table.find_all('li', attrs = {'itemprop':'owns'}):
repo = {}
repo['name'] = row.find('div').find('h3').a.text
#First Error Position
try:
repo['desc'] = row.find('div').p.text
except TypeError:
repo['desc'] = None
#Second Error Postion
try:
repo['lang'] = row.find('div', attrs = {'class':'f6 text-gray mt-2'}).find('span', attrs = {'class':'mr-3'}).text
except TypeError:
repo['lang'] = None
repos.append(repo)
我会倾向于尝试/除个人之外,因为它更简洁,异常捕获是一个很好的做法,可以提高程序的稳健性
你的find
调用是不准确和链接的,所以当你试图找到一个没有<div>
孩子的p
标签时,你会得到None
,但是你继续在.text
上调用None
属性,它会用AttributeError
崩溃你的程序。
尝试下面的一组.find
调用,它们使用你所使用的itemProp
属性并使用try-except
块来null合并任何缺少的字段:
import requests
from bs4 import BeautifulSoup
import csv
URL = "https://github.com/DURGESHBARWAL?tab=repositories"
r = requests.get(URL)
soup = BeautifulSoup(r.text, 'html.parser')
repos = []
table = soup.find('ul', attrs = {'data-filterable-for': 'your-repos-filter'})
for row in table.find_all('li', {'itemprop': 'owns'}):
repo = {
'name': row.find('a', {'itemprop' : 'name codeRepository'}),
'desc': row.find('p', {'itemprop' : 'description'}),
'lang': row.find('span', {'itemprop' : 'programmingLanguage'})
}
for k, v in repo.items():
try:
repo[k] = v.text.strip()
except AttributeError: pass
repos.append(repo)
filename = 'extract.csv'
with open(filename, 'w') as f:
w = csv.DictWriter(f,['name','desc','lang'])
w.writeheader()
for repo in repos:
w.writerow(repo)
调试输出(除了书面CSV):
[ { 'desc': 'This a Django-Python Powered a simple functionality based '
'Bot application',
'lang': 'Python',
'name': 'Sandesh'},
{'desc': None, 'lang': 'Jupyter Notebook', 'name': 'python_notes'},
{ 'desc': 'Installing DSpace using docker',
'lang': 'Java',
'name': 'DSpace-Docker-Installation-1'},
{ 'desc': 'This Repo Contains the DSpace Installation Steps',
'lang': None,
'name': 'DSpace-Installation'},
{ 'desc': '(Official) The DSpace digital asset management system that '
'powers your Institutional Repository',
'lang': 'Java',
'name': 'DSpace'},
{ 'desc': 'This Repo contain the DSpace installation steps with '
'docker.',
'lang': None,
'name': 'DSpace-Docker-Installation'},
{ 'desc': 'This Repository contain the Intermediate system for the '
'Collaboration and DSpace System',
'lang': 'Python',
'name': 'Community-OER-Repository'},
{ 'desc': 'A class website to share the knowledge and expanding the '
'productivity through digital communication.',
'lang': 'PHP',
'name': 'class-website'},
{ 'desc': 'This is a POC for the Voting System. It is a precise '
'design and implementation of Voting System based on the '
'features of Blockchain which has the potential to '
'substitute the traditional e-ballet/EVM system for voting '
'purpose.',
'lang': 'Python',
'name': 'Blockchain-Based-Ballot-System'},
{ 'desc': 'It is a short describtion of Modern Django',
'lang': 'Python',
'name': 'modern-django'},
{ 'desc': 'It is just for the sample work.',
'lang': 'HTML',
'name': 'Task'},
{ 'desc': 'This Repo contain the sorting algorithms in C,predefiend '
'function of C, C++ and Java',
'lang': 'C',
'name': 'Sorting_Algos_Predefined_functions'},
{ 'desc': 'It is a arduino program, for monitor the temperature and '
'humidity from sensor DHT11.',
'lang': 'C++',
'name': 'DHT_11_Arduino'},
{ 'desc': "This is a registration from,which collect data from user's "
'desktop and put into database after validation.',
'lang': 'PHP',
'name': 'Registration_Form'},
{ 'desc': 'It is a dynamic multi-part data driven search engine in '
'PHP & MySQL from absolutely scratch for the website.',
'lang': 'PHP',
'name': 'search_engine'},
{ 'desc': 'It is just for learning github.',
'lang': None,
'name': 'Hello_world'}]