使用 Python BeautifulSoup 进行网页抓取

问题描述 投票:0回答:1

我想通过使用 Python BeautifulSoup 从 网站 进行网页抓取来收集数据,用于我的数据分析项目。

我想从网站收集的数据;

  1. 日期:2027年7月6日
  2. 舞台:Berghain、Panorama Bar、Garten
  3. 时间表
  4. 艺术家
  5. 标签

最终我想将数据传输到SQL来构建这个示例表

样本表

我在网络抓取的第一步中陷入了困境。

import requests
from bs4 import BeautifulSoup

url = 'https://www.berghain.berlin/en/event/77218/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
soup.find('div', class_='running-order-set__info').find('span').contents[0]

结果:“诺曼·诺奇”

我只得到了第一个艺术家的名字..😢 并且不知道如何收集其他信息!

任何好心人可以帮助贫困学生尝试做一些有趣的项目吗?

python html web-scraping beautifulsoup
1个回答
0
投票

你可以尝试:

import requests
from bs4 import BeautifulSoup

url = "https://www.berghain.berlin/en/event/77218/"

soup = BeautifulSoup(requests.get(url).content, "html.parser")

for li in soup.select("main li"):
    date = li.find_previous("p")
    name = li.find_previous("h1")
    stage = li.find_previous("h2")
    time = li.time.text
    artist = li.select_one(".running-order-set__info span")

    print("Date  ", date.get_text(strip=True, separator=" ").split()[1])
    print("Name  ", name.get_text(strip=True))
    print("Stage ", stage.get_text(strip=True))
    print("Time  ", time)
    print("Artist", artist.contents[0] if artist.contents else "-")
    print("Note  ", artist.span.text if artist.span else "-")

    print("-" * 80)

打印:

Date   06.07.2024                           
Name   Klubnacht                                                                                         
Stage  Berghain  
Time   23:59    
Artist Norman Nodge 
Note   Ostgut Ton
--------------------------------------------------------------------------------
Date   06.07.2024      
Name   Klubnacht                                                                                         
Stage  Berghain  
Time   04:30    
Artist Alienata    
Note   -    
--------------------------------------------------------------------------------
Date   06.07.2024
Name   Klubnacht                                                                                         
Stage  Berghain  
Time   08:30    
Artist UVB         
Note   Mord 
--------------------------------------------------------------------------------
Date   06.07.2024
Name   Klubnacht                                                                                         
Stage  Berghain  
Time   12:30    
Artist Matthew Cha 
Note   -    
--------------------------------------------------------------------------------
Date   06.07.2024
Name   Klubnacht                                                                                         
Stage  Berghain  
Time   16:30    
Artist Gaetano Parisio
Note   -    
--------------------------------------------------------------------------------
Date   06.07.2024         
Name   Klubnacht                                                                                         
Stage  Berghain  
Time   20:30    
Artist Justine Perry
Note   -    
--------------------------------------------------------------------------------
Date   06.07.2024
Name   Klubnacht                                                                                         
Stage  Berghain  
Time   00:30    
Artist DJ Nobu 
Note   Bitta
--------------------------------------------------------------------------------
Date   06.07.2024
Name   Klubnacht                                                                                         
Stage  Panorama Bar                            
Time   23:59
Artist Lauer 
Note   Live at Robert Johnson / Running Back
--------------------------------------------------------------------------------
Date   06.07.2024
Name   Klubnacht
Stage  Panorama Bar
Time   04:00
Artist Dam Swindle 
Note   Heist Recordings
--------------------------------------------------------------------------------
Date   06.07.2024
Name   Klubnacht
Stage  Panorama Bar
Time   08:00
Artist Kikelomo
Note   -
--------------------------------------------------------------------------------
Date   06.07.2024
Name   Klubnacht
Stage  Panorama Bar
Time   12:30
Artist -
Note   -
--------------------------------------------------------------------------------
Date   06.07.2024
Name   Klubnacht
Stage  Panorama Bar
Time   19:30
Artist Wallace
Note   -
--------------------------------------------------------------------------------
Date   06.07.2024
Name   Klubnacht
Stage  Panorama Bar
Time   00:00
Artist Cinthie 
Note   803 Crystal Grooves
--------------------------------------------------------------------------------
Date   06.07.2024
Name   Klubnacht
Stage  Garten
Time   12:00
Artist Suze Ijó
Note   -
--------------------------------------------------------------------------------
Date   06.07.2024
Name   Klubnacht
Stage  Garten
Time   16:00
Artist Hiroko Yamamura
Note   -
--------------------------------------------------------------------------------
© www.soinside.com 2019 - 2024. All rights reserved.