Web报废,带有漂亮的汤多个重复标签

问题描述 投票:0回答:1

这是我第一次进行网页抓取,并且正在关注此tutorial。我正在使用此website抓取信息。我正在尝试获取文字为“ 89426 Green Mountain Road,Astoria,OR97103。电话:503-325-9720”。我注意到我的ul标签中有多个lidiv class_=alert标签。因此,我不确定如何抓住特定的一个。这是我尝试过的方法,但继续从另一组ul / li中获得不同的文本。

from bs4 import BeautifulSoup
import requests

source = requests.get('https://www.pickyourownchristmastree.org/ORxmasnw.php').text

soup = BeautifulSoup(source, 'lxml')

noble_ridge = soup.find('div', class_='alert')
information = noble_ridge.ul.li.text
print(information)
# print(soup.prettify())


C:\Users\name\anaconda3\envs\Scraping\python.exe C:/Users/name/PycharmProjects/Scraping/Christmas_tree_farms.py
If the name of the farm is blue with an underline; that's a link to their website. Click on it for the most current hours and information.

Process finished with exit code 0
web-scraping beautifulsoup pycharm
1个回答
0
投票
import requests
from bs4 import BeautifulSoup


def main(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content, 'html.parser')
    target = soup.select_one("span.farm")
    goal = list(target.next_elements)[5].rsplit(" ", 2)[0]
    print(goal)


main("https://www.pickyourownchristmastree.org/ORxmasnw.php")

输出:

89426 Green Mountain Road, Astoria, OR 97103. Phone: 503-325-9720.
© www.soinside.com 2019 - 2024. All rights reserved.