尝试从 div 标签中抓取文本,但得到 Nonetype

问题描述 投票:0回答:1

我正在用Python抓取一个房地产网站,但我一直试图抓取代理商的公司名称。我收到 AttributeError: 'NoneType' 对象没有属性 'text。

This is the snapshot of the text i want to scrape

this is the error i get running my code 如有任何帮助,我们将不胜感激。

from bs4 import BeautifulSoup
import requests

url = "https://www.point2homes.com/MX/Real-Estate-Listings.html?LocationGeoId=&LocationGeoAreaId=&Location=San%20Felipe,%20Baja%20California,%20Mexico"
page_scrape = requests.get(url)

soup = BeautifulSoup(page_scrape.content, 'html.parser')

lists = soup.find_all('article')

for list in lists:
    address = list.find('div', class_="address-container").text
    try:
        beds = list.find('li', class_="ic-beds").text
    except:
        print("Data Not Logged")
    try:
        baths = list.find('li', class_="ic-baths").text
    except:
        print("Data not logged")
    try:
        size = list.find('li', class_="ic-sqft").text
    except:
        print("Data not logged")
    type = list.find('li', class_="property-type ic-proptype").text
    price = list.find('span', class_="green").text
    agent = list.find('div', class_="agent-name").text
    firm = list.find('div', class_="agent-company").text

    info = [address, beds, baths, size, type, price, agent, firm]

    print(info)
python-3.x web-scraping beautifulsoup
1个回答
0
投票

看起来漂亮的汤不能与标准标签格式正常工作,但是当您尝试在没有.text的情况下打印公司时,数据是存在的,因此您可以执行简单的子字符串操作:

我尝试在这里工作代码:

from bs4 import BeautifulSoup
import requests


url = "https://www.point2homes.com/MX/Real-Estate-Listings.html?LocationGeoId=&LocationGeoAreaId=&Location=San%20Felipe,%20Baja%20California,%20Mexico"

headers = {"User-Agent": "Mozilla/5.0","Content-Type": "application/json"}

page_scrape = requests.get(url, headers=headers)
soup = BeautifulSoup(page_scrape.content, 'html.parser')

lists = soup.find_all('article')

for list in lists:
    address = list.find('div', class_="address-container").text
    try:
        beds = list.find('li', class_="ic-beds").text
    except:
        print("Data Not Logged")
    try:
        baths = list.find('li', class_="ic-baths").text
    except:
        print("Data not logged")
    try:
        size = list.find('li', class_="ic-sqft").text
    except:
        print("Data not logged")
    type = list.find('li', class_="property-type ic-proptype").text
    price = list.find('span', class_="green").text
    agent = list.find('div', class_="agent-name").text
   
    firmstr = list.find('div', class_="agent-company")
    firm=''
    
    if firmstr is not None:
        spl_word = '>'
        firmstr2=str(firmstr)
        res = firmstr2.split(spl_word, 1)
        splitString = res[1]
        
        res2 = splitString.split('<', 1)
        splitString2 = res2[0]
       
        firm=splitString2
    
   
    info = [address, beds, baths, size, type, price, agent, firm]

    print(info); 
© www.soinside.com 2019 - 2024. All rights reserved.