我是Python编程的新手,尝试使用Beautifulsoup进行网页抓取,使用FOR循环应用迭代器,但我猜它只运行一次,下次它向我显示一些错误,尝试过很多,但无法解决。
以下是我的代码 -
from bs4 import BeautifulSoup
from urllib.request import urlopen
url = 'https://www.packtpub.com/all'
page = urlopen(url)
soup_packtpage = BeautifulSoup(page,'lxml')
page.close()
all_book = soup_packtpage.find_all("div",class_='book-block-outer')
for book_title in all_book:
title = book_title.div['data-product-title']
price = book_title.div['data-product-price']
category = book_title.div['data-product-category']
print(title)
print("Rs:-"+ price)
print(category)
以下是输出 -
使用Java学习日常应用程序的算法和数据结构[视频] Rs:-199.44应用程序开发
回溯(最近一次调用最后一次):文件“/ home / bhagwatanimesh / PycharmProjects / packet_pub / packet_pub”,第17行,标题= book_title.div ['data-product-title']文件“/home/bhagwatanimesh/.local/ lib / python3.5 / site-packages / bs4 / element.py“,第1011行,在getitem中返回self.attrs [key] KeyError:'data-product-title'
看来,您正在尝试访问字典中不存在的密钥。要解决此问题,您可以使用以下代码。
for book_title in all_book:
try:
title = book_title.div['data-product-title']
price = book_title.div['data-product-price']
category = book_title.div['data-product-category']
print(title)
print("Rs:-"+ price)
print(category)
except:
continue