Web抓取“ itemprop”输出

问题描述 投票:0回答:1

嗨,我编写了以下代码来获取城市的位置。

import requests
from bs4 import BeautifulSoup

#Loads the webpage
r = requests.get("https://www.century21.com/for-sale-homes/Westport-CT-20647c", headers={'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'})
#grabs the contect of this page
c=r.content

if "blocked" in r.text:
    print ("we've been blocked")



#makes the content more readable
soup=BeautifulSoup(c,"html.parser")

#Prints out the content 
#print(soup.prettify())

#Finds the number of proterty Listed
all=soup.find_all("div", {"class":"sr-card js-safe-link"})

#Finds the city of the property of interest
x=all[1].find("div", {"class":"sr-card__city-state"})




for itemprop in x:
        print(x.find("span", itemprop="addressLocality").text)

x的输出如下

<div class="sr-card__city-state">
<span itemprop="addressLocality">Westport</span>,
            <span itemprop="addressRegion">CT</span>
<span itemprop="postalCode">06880</span>
</div>

当执行我的for循环时,我得到以下输出

Westport
Westport
Westport
Westport
Westport
Westport
Westport

虽然它打印正确的输出,但我不明白为什么要打印7次。我了解自己在犯错误,但不知道自己在哪里犯错。如果有人能指出正确的方向,我将不胜感激。

谢谢

python web-scraping output
1个回答
0
投票

x的长度为7,这就是它显示输出7次的原因。您可能想尝试类似的方法,

#Finds the number of proterty Listed
all=soup.find_all("div", {"class":"sr-card js-safe-link"})

#Finds the city of the property of interest
x=all[1].find("div", {"class":"sr-card__city-state"})

print(x)

print(len(x)) # Length of x

for prop in x.find("span", itemprop="addressLocality"):
        print(prop)
© www.soinside.com 2019 - 2024. All rights reserved.