无法使用Python获取span文本

问题描述 投票:-1回答:2

我有一个提供登录网页的供应商,我试图获得价格和可用性。在VBA中,选择器在Python中工作我得到无。

这是我得到价格的HTML部分:

<div class="product-info-price">
  <div class="price-box price-final_price" data-role="priceBox" data-product- 
  id="32686" data-price-box="product-id-32686">
    <span class="special-price">
      <span class="price-container price-final_price tax weee"  itemprop="offers" itemscope itemtype="http://schema.org/Offer">
        <span class="price-label">Ειδική Τιμή</span>
        <span  id="product-price-32686"  data-price-amount="7.9" data-price-type="finalPrice" class="price-wrapper " >
          <span class="price">7,90 €</span>
        </span>
        <meta itemprop="price" content="7.9" />
        <meta itemprop="priceCurrency" content="EUR" />
      </span>
    </span>
  </div>
</div>

在VBA中,我使用以下选择器:

.price-box .price-final_price .price

在Python中我使用:

price = soup.find('span', attrs={'class':'price'})

if price is not None:
  price_text = price.text.strip()
  print(price_text)
else:
  price_text = "0,00"
  print(price_text)

我总是得到0,00作为价格..

我应该在soup.find中改变什么?

python python-3.x web-scraping
2个回答
3
投票

Css选择器通常比xpath快。您可以使用以下内容:

from bs4 import BeautifulSoup as bs

html = '''
<div class="product-info-price">
  <div class="price-box price-final_price" data-role="priceBox" data-product- 
  id="32686" data-price-box="product-id-32686">
    <span class="special-price">
      <span class="price-container price-final_price tax weee"  itemprop="offers" itemscope itemtype="http://schema.org/Offer">
        <span class="price-label">Ειδική Τιμή</span>
        <span  id="product-price-32686"  data-price-amount="7.9" data-price-type="finalPrice" class="price-wrapper " >
          <span class="price">7,90 €</span>
        </span>
        <meta itemprop="price" content="7.9" />
        <meta itemprop="priceCurrency" content="EUR" />
      </span>
    </span>
  </div>
</div>
'''

soup = bs(html, 'lxml')
prices = [price.text for price in soup.select('.price')]
print(prices)

或者:

altPrices = [price['content'] for price in soup.select("[itemprop=price]")]
print(altPrices)

0
投票

我更喜欢lxml,我很清楚使用xPath而不是css选择器:

from lxml import html
all_html = html.fromstring(the_entire_html)
price = all_html.xpath('//meta[@itemprop="price"]/@content')
# or
price = all_html.xpath('//div[@class="product-info-price"]//span[@class="price"]/text()')
© www.soinside.com 2019 - 2024. All rights reserved.