Python (BeautifulSoup) 只有 1 个结果

问题描述 投票:0回答:3

我知道有与此类似的问题已得到解答,我已经尝试申请但没有解决我的问题。

我的问题是,在这个网站上:http://books.toscrape.com/catalogue/page-1.html有20个价格,当我尝试抓取价格时,我只得到第一个价格,而没有其他价格19.

这是代码

from bs4 import BeautifulSoup
import requests
URL = 'http://books.toscrape.com/catalogue/page-1.html'
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find_all("div", class_ = "col-sm-8 col-md-9")

for i in results :
    prices = i.find("p", class_ = "price_color")
    print(prices.text.strip())
    print()
python beautifulsoup
3个回答
1
投票

您搜索物品的方式错误。

只有一个

div
,其中
col-sm-8 col-md-9
有许多
prices
,但您的代码期望有许多
divs
,每个
div
中只有一个价格 - 这会产生问题。

使用

find()
您可以在此
div
中搜索单个价格,但您应该使用
find_all
来获取此
div
中的所有价格。

div = soup.find("div", class_="col-sm-8 col-md-9")

prices = div.find_all("p", class_="price_color")

for i in prices:
    print(i.text.strip())

您甚至可以直接搜索价格

prices = soup.find_all("p", class_="price_color")

for i in prices:
    print(i.text.strip())

最小工作示例:

from bs4 import BeautifulSoup
import requests

url = 'http://books.toscrape.com/catalogue/page-1.html'
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

div = soup.find("div", class_="col-sm-8 col-md-9")

prices = soup.find_all("p", class_="price_color")

for i in prices:
    print(i.text.strip())

仅当您首先找到具有单一价格的所有地区时,才可以使用

find()
搜索价格 - 例如
article

每本书都是分开的

article
- 所以有很多
articles
并且每本书
article
都有单一价格(以及单一标题、单一图像等)

from bs4 import BeautifulSoup
import requests

url = 'http://books.toscrape.com/catalogue/page-1.html'
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

results = soup.find_all("article")

for i in results:
    title = i.find("h3")
    print('title:', title.text.strip())

    price = i.find("p", class_="price_color")
    print('price:', price.text.strip())

    print('---')

结果:

title: A Light in the ...
price: £51.77
---
title: Tipping the Velvet
price: £53.74
---
title: Soumission
price: £50.10
---
title: Sharp Objects
price: £47.82
---
title: Sapiens: A Brief History ...
price: £54.23
---
title: The Requiem Red
price: £22.65
---
title: The Dirty Little Secrets ...
price: £33.34
---
title: The Coming Woman: A ...
price: £17.93
---
title: The Boys in the ...
price: £22.60
---
title: The Black Maria
price: £52.15
---
title: Starving Hearts (Triangular Trade ...
price: £13.99
---
title: Shakespeare's Sonnets
price: £20.66
---
title: Set Me Free
price: £17.46
---
title: Scott Pilgrim's Precious Little ...
price: £52.29
---
title: Rip it Up and ...
price: £35.02
---
title: Our Band Could Be ...
price: £57.25
---
title: Olio
price: £23.88
---
title: Mesaerion: The Best Science ...
price: £37.59
---
title: Libertarianism for Beginners
price: £51.33
---
title: It's Only the Himalayas
price: £45.17
---

0
投票

这段代码应该可以工作!

import requests
from bs4 import BeautifulSoup


URL = 'http://books.toscrape.com/catalogue/page-1.html'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')

list_of_books = soup.select(
    # using chrom selector
    '#default > div > div > div > div > section > div:nth-child(2) > ol > li'
)

for book in list_of_books:
    price = book.find('p', {'class': 'price_color'})
    print(price.text.strip())

我刚刚使用了chorme选择器 这是它的截图

您在错误的地方使用了

find
find_all


0
投票

@ihonestlydontKnow,如果您将此行更改为“article”,您的代码将起作用:

results = soup.find_all("article")

(正如furas在他的回复中提到的)

**打印(结果) ....

<article class="product_pod">
<div class="image_container">
<a href="libertarianism-for-beginners_982/index.html"><img alt="Libertarianism for Beginners" class="thumbnail" src="../media/cache/0b/bc/0bbcd0a6f4bcd81ccb1049a52736406e.jpg"/></a>
</div>
<p class="star-rating Two">
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
</p>
<h3><a href="libertarianism-for-beginners_982/index.html" title="Libertarianism for Beginners">Libertarianism for Beginners</a></h3>
<div class="product_price">
<p class="price_color">£51.33</p>
<p class="instock availability">
<i class="icon-ok"></i>

        In stock

</p>
<form>
<button class="btn btn-primary btn-block" data-loading-text="Adding..." type="submit">Add to basket</button>
</form>
</div>
</article>

****输出

51.77 英镑

£53.74
50.10 英镑
£47.82
54.23 英镑
22.65 英镑
33.34 英镑
17.93 英镑

...

(vwebtuan) tng@rack-dff0:~$ cat a.py

from bs4 import BeautifulSoup
import requests
URL = 'http://books.toscrape.com/catalogue/page-1.html'
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find_all("article")
#print(results)
for i in results :
    prices = i.find("p", class_ = "price_color")
    print(prices.text.strip())

(vwebtuan) tng@rack-dff0:~$

© www.soinside.com 2019 - 2024. All rights reserved.