I'm trying to scrape data from a website, using the following code:
containers = page_soup.findAll("div", {"class": "item-info"})
container = containers[0]
container
输出:
<div class="item-info">
<!--brand info-->
<div class="item-branding">
<a class="item-brand" href="https://www.newegg.com/Ugg-Australia/BrandStore/ID-59551">
<img alt="Ugg Australia" src="//c1.neweggimages.com/Brandimage_70x28//Brand59551.gif" title="Ugg Australia"/>
</a>
<!--rating info-->
</div>
<!--description info-->
<a class="item-title" href="https://www.newegg.com/ugg-australia-black-boots/p/0F5-003V-00P74" title="View Details">Ugg Australia Bailey Button II Women US 5 Black Winter Boot</a>
<!--promption info-->
<p class="item-promo"></p>
<!--feature-->
<ul class="item-features">
<li><strong>Brand:</strong> Ugg Australia</li><li><strong>Type:</strong> Boots</li><li><strong>Color:</strong> Black</li><li><strong>Occasion:</strong> Specialty</li>
<li><strong>Model #: </strong>1016422/BLK</li>
<li><strong>Return Policy: </strong><a href="https://www.newegg.com/AreaTrend/about" target="_blank" title="View Return Policy(new window)">View Return Policy</a></li>
</ul>
<div class="item-action">
<!--price-->
<ul class="price ">
<li class="price-was">
$140.00
<span class="price-was-data" style="display: none">140.00</span>
......
接下来,当我尝试使用此代码删除标题名称时:
for container in containers:
title_container = container.findAll("a", {"class" : "item-title"})
title_container[0].text
我要在页面中获得随机产品的标题,而不是理想上应该得到的第一个产品名称:Ugg澳大利亚Bailey Button II女士US 5黑色冬季靴子
我在做什么错?
.findAll
将为您提供html代码中的所有产品。您可以如下对每个单独的项目进行迭代: