这是我第一次进行网页抓取,并且正在关注此tutorial。我正在使用此website抓取信息。我正在尝试获取文字为“ 89426 Green Mountain Road,Astoria,OR97103。电话:503-325-9720”。我注意到我的ul
标签中有多个li
和div class_=alert
标签。因此,我不确定如何抓住特定的一个。这是我尝试过的方法,但继续从另一组ul
/ li
中获得不同的文本。
from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.pickyourownchristmastree.org/ORxmasnw.php').text
soup = BeautifulSoup(source, 'lxml')
noble_ridge = soup.find('div', class_='alert')
information = noble_ridge.ul.li.text
print(information)
# print(soup.prettify())
C:\Users\name\anaconda3\envs\Scraping\python.exe C:/Users/name/PycharmProjects/Scraping/Christmas_tree_farms.py
If the name of the farm is blue with an underline; that's a link to their website. Click on it for the most current hours and information.
Process finished with exit code 0
import requests
from bs4 import BeautifulSoup
def main(url):
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
target = soup.select_one("span.farm")
goal = list(target.next_elements)[5].rsplit(" ", 2)[0]
print(goal)
main("https://www.pickyourownchristmastree.org/ORxmasnw.php")
输出:
89426 Green Mountain Road, Astoria, OR 97103. Phone: 503-325-9720.