在 Python 中使用 BeautifulSoup 抓取 HTML

问题描述 投票:0回答:1

我正在尝试从以下网站抓取特定日期的可用开球时间:https://mt-prospect-golf-club.book.teeitup.golf/?course=10277&date=2024-08-27.

理想情况下,我想抓取时间、价格、玩家数量和课程,但不断收到以下错误:AttributeError:'NoneType'对象没有属性'get_text'。

这是我用来尝试抓取网站的代码:

import requests
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import datetime
main_url = "https://mt-prospect-golf-club.book.teeitup.golf/?course=10277&date=2024-08-27"  
response = requests.get(main_url)
soup = BeautifulSoup(response.text, "html.parser")
Tee_Times = soup.select_one('\[class="card-title"\]').get_text()
Prices = soup.select_one('\[class= "card-text"\]').get_text()
Number_of_Players = soup.select_one('\[class="bs-stepper-label"\]').get_text()
Golf_Course = soup.select_one('\[data-testid="teetimes-tile-course-name"\]').get_text()
print(Tee_Times)
print(Prices)
print(Number_of_Players)
print(Golf_Course)

我尝试更新代码以使用不同的选择选项,每个选项都会给出相同的错误。

非常感谢任何帮助!

python web-scraping beautifulsoup
1个回答
0
投票

您的错误通常意味着

select_one
没有找到任何与 CSS 选择器匹配的元素。因为它返回了
none
并尝试在
get_text()
上调用
none
结果是
AttributeError

您应该检查 CSS 选择器;因为它们可能与 HTML 的结构不匹配。我也确实认为您的选择器包含反冲

\
,这是无意的。

如果选择器已更新,并且它们与网页的结构匹配,您可以使用提供的以下代码:

Tee_Times = soup.select_one('.card-title').get_text()
Prices = soup.select_one('.card-text').get_text()
Number_of_Players = soup.select_one('.bs-stepper-label').get_text()
Golf_Course = soup.select_one('[data-testid="teetimes-tile-course-name"]').get_text()

您可能会使用 JavaScript 动态加载内容,在这种情况下

request.get
将仅获取 HTML。

检查元素是否存在以避免出现

NoneType
错误:

tee_times_element = soup.select_one('.card-title')
prices_element = soup.select_one('.card-text')
players_element = soup.select_one('.bs-stepper-label')
course_element = soup.select_one('[data-testid="teetimes-tile-course-name"]')

if tee_times_element:
    Tee_Times = tee_times_element.get_text()
else:
    Tee_Times = "No data found"

if prices_element:
    Prices = prices_element.get_text()
else:
    Prices = "No data found"

if players_element:
    Number_of_Players = players_element.get_text()
else:
    Number_of_Players = "No data found"

if course_element:
    Golf_Course = course_element.get_text()
else:
    Golf_Course = "No data found"

print(Tee_Times)
print(Prices)
print(Number_of_Players)
print(Golf_Course)
© www.soinside.com 2019 - 2024. All rights reserved.