网络爬虫在收集了2页数据后崩溃。

Question

我正在搜刮一个单元iPhone手机壳的网站。

网站搜刮器应该收集产品的名称和价格。当我运行程序时，我的代码崩溃了，我得到了这个错误。

Traceback (most recent call last):
  File "phonecases.py", line 12, in <module>
    price = content.find(class_="products-grid-price").get_text().replace('\n','')
AttributeError: 'NoneType' object has no attribute 'get_text'

这是因为有些商品正在打折，当一个商品不打折时，这个类是 products-grid-price 而当某一商品打折时，该类商品的价格为 products-grid-price-sale.因此，程序收集我想要的数据，直到它到达一个出售的项目，然后它就崩溃了。

我如何修正我的程序，使它要么跳过销售中的项目，要么将它们作为不同的数据点收集？

这是我的代码。

import requests
from bs4 import BeautifulSoup

url = 'https://www.cellphonecases.com/Apple-Iphone-11-C2429.html?page='
    for page in range(1, 5):
        response = requests.get(url + str(page))
        soup = BeautifulSoup(response.text, 'html.parser')
        contents = soup.find_all(class_="products-grid-container-out")

    for content in contents:
        title = content.find(class_="products-gridname").get_text().replace('\n','')
        price = content.find(class_="products-grid-price").get_text().replace('\n','')
        print(title, price)

Answer 1

使用一个试试看例如，它将首先尝试代码，如果它抛出一个异常（错误），它将捕获它并在except块中运行代码。

price = None
try:
    price = content.find(class_="products-grid-price").get_text().replace('\n','')
except:
    price = content.find(class_="products-grid-price-sale").get_text().replace('\n','')

它将首先尝试代码，如果它抛出一个异常（错误），它将抓住它，并在except块中运行代码。

或者类似这样。

price = None
price_field = content.find(class_="products-grid-price")
if price_field:
    price = price_field.get_text()
else:
    price = content.find(class_="products-grid-price-sale").get_text()

# clean price
price = price.replace('\n','')

网络爬虫在收集了2页数据后崩溃。

问题描述投票：0回答：1

1个回答

最新问题

网络爬虫在收集了2页数据后崩溃。

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1