所以我正试图抓取一个亚马逊页面的数据,当我试图解析卖家所在的位置时,我收到了一个错误。这是我的代码:
#getting the html
request = urllib2.Request('http://www.amazon.com/gp/offer-listing/0393934241/')
opener = urllib2.build_opener()
#hiding that I'm a webscraper
request.add_header('User-Agent', 'Mozilla/5 (Solaris 10) Gecko')
#opening it up, putting into soup form
html = opener.open(request).read()
soup = BeautifulSoup(html, "html5lib")
#parsing for the seller info
sellers = soup.findAll('div', {'class' : 'a-row a-spacing-medium olpOffer'})
for eachseller in sellers:
#parsing for price
price = eachseller.find('span', {'class' : 'a-size-large a-color-price olpOfferPrice a-text-bold'})
#parsing for shipping costs
shippingprice = eachseller.find('span'
, {'class' : 'olpShippingPrice'})
#parsing for condition
condition = eachseller.find('span', {'class' : 'a-size-medium'})
#parsing for seller name
sellername = eachseller.find('b')
#parsing for seller location
location = eachseller.find('div', {'class' : 'olpAvailability'})
#printing it all out
print "price, " + price.string + ", shipping price, " + shippingprice.string + ", condition," + condition.string + ", seller name, " + sellername.string + ", location, " + location.string
我得到的错误信息与最后的'print'命令有关:TypeError: coercing to Unicode: need string or buffer, NoneType found
我知道它来自这一行 - location = eachseller.find('div', {'class' : 'olpAvailability'})
- 因为代码在没有该行的情况下工作正常,而且我知道我得到的是NoneType,因为该行没有找到任何东西。这是我要解析的部分中的html:
<div class="olpAvailability">
In Stock.
Ships from WI, United States.
<br/><a href="/gp/aag/details/ref=olp_merch_ship_9/175-0430757-3801038?ie=UTF8&asin=0393934241&seller=A1W2IX7T37FAMZ&sshmPath=shipping-rates#aag_shipping">Domestic shipping rates</a>
and <a href="/gp/aag/details/ref=olp_merch_return_9/175-0430757-3801038?ie=UTF8&asin=0393934241&seller=A1W2IX7T37FAMZ&sshmPath=returns#aag_returns">return policy</a>.
</div>
我没有看到“位置”代码行有什么问题,或者为什么它没有提取我想要的数据。
编辑:我想通了,但我不知道为什么。如果我更改print命令以打印location.find(text = True),它会输出我想要的位置。希望有一天能帮助某人。
好像你在寻找错误的班级名字
<div class="a-column a-span3 olpDeliveryColumn" role="gridcell">
<p class="a-spacing-mini olpAvailability">
<ul class="a-unordered-list a-vertical olpFastTrack">
<li><span class="a-list-item">
Ships from WI, United States.
</span></li>
<li><span class="a-list-item">
<a href="/gp/aag/details?ie=UTF8&asin=0393934241&seller=A263RIO308P3G8&sshmPath=shipping-rates#aag_shipping">Shipping rates</a>
and <a href="/gp/aag/details?ie=UTF8&asin=0393934241&seller=A263RIO308P3G8&sshmPath=returns#aag_returns">return policy</a>.
</span></li>
</ul>
</p>
</div>
在代码中更改此行:
location = eachseller.find('div', {'class' : 'olpDeliveryColumn'})