从div标签抓取网页将返回随机产品的标题名称,而应返回第一个标题名称

问题描述 投票:0回答:1
I'm trying to scrape data from a website, using the following code:
containers = page_soup.findAll("div", {"class": "item-info"})
container = containers[0]
container

输出:

<div class="item-info">
<!--brand info-->
<div class="item-branding">
<a class="item-brand" href="https://www.newegg.com/Ugg-Australia/BrandStore/ID-59551">
<img alt="Ugg Australia" src="//c1.neweggimages.com/Brandimage_70x28//Brand59551.gif" title="Ugg Australia"/>
</a>
<!--rating info-->
</div>
<!--description info-->
<a class="item-title" href="https://www.newegg.com/ugg-australia-black-boots/p/0F5-003V-00P74" title="View Details">Ugg Australia Bailey Button II Women US 5 Black Winter Boot</a>
<!--promption info-->
<p class="item-promo"></p>
<!--feature-->
<ul class="item-features">
<li><strong>Brand:</strong> Ugg Australia</li><li><strong>Type:</strong> Boots</li><li><strong>Color:</strong> Black</li><li><strong>Occasion:</strong> Specialty</li>
<li><strong>Model #: </strong>1016422/BLK</li>
<li><strong>Return Policy: </strong><a href="https://www.newegg.com/AreaTrend/about" target="_blank" title="View Return Policy(new window)">View Return Policy</a></li>
</ul>
<div class="item-action">
<!--price-->
<ul class="price ">
<li class="price-was">
       $140.00
            <span class="price-was-data" style="display: none">140.00</span>

......

接下来,当我尝试使用此代码删除标题名称时:

for container in containers:
    title_container = container.findAll("a", {"class" : "item-title"})
    title_container[0].text 

我要在页面中获得随机产品的标题,而不是理想上应该得到的第一个产品名称:Ugg澳大利亚Bailey Button II女士US 5黑色冬季靴子

我在做什么错?

python html web-scraping beautifulsoup tags
1个回答
0
投票

.findAll将为您提供html代码中的所有产品。您可以如下对每个单独的项目进行迭代:

© www.soinside.com 2019 - 2024. All rights reserved.