使用 Genius API,我获取了歌词页面的歌曲 url。我现在想使用
beautifulsoup4
进行网络抓取;但是,我遇到了一个错误。这是代码:
from bs4 import BeautifulSoup
import requests
def scrap_song_url(url):
page = requests.get(url)
html = BeautifulSoup(page.text, 'html.parser')
lyrics = html.find('div', class_='lyrics').get_text()
return lyrics
在这里,我正在查看歌词页面的 html。为了举例,请查看这个特定的网址:
https://genius.com/Acceptance-permanent-lyrics
。通过 html 进行探索,歌词似乎包含在 div
下,类为 'lyrics'
。
但是,尝试使用
html.find
找到它会返回 NoneType
对象,因此 .get_text()
会引发错误。我认为这意味着,由于某种原因,没有找到 html 标签(或者无论你怎么称呼它,我真的不知道 html)。如何从给定歌词 url 的 div 类 'lyrics'
获取歌词?
有一个受支持且看起来很酷的 Genius API Python 包装器:LyricsGenius。你应该试试。使用 pip 安装很简单:
pip install lyricsgenius
从其文档来看,收集歌词看起来容易得多:
from lyricsgenius import Genius
genius = Genius(token)
genius.search_artist('Andy Shauf')
artist.save_lyrics()
呃,我不认为那是歌词的地方。对于那个特定的页面,我做了:
lyrics = html.select("div[class*=Lyrics__Container]")
并获得了歌词(与一堆其他 HTMl 混合在一起)。有很多清洁工作要做。 '*' 使您能够获得以 Lyrics__Container start 的所有类,因为之后有一串数字和字母,我认为它们可能会更改。
首先使用 attribute 选择器隔离主歌/副歌部分后,您可以使用 stripped_strings 挑选出单独的行。语法外部有一些列表未嵌套。
import requests
from bs4 import BeautifulSoup as bs
from pprint import pprint
r = requests.get('https://genius.com/Acceptance-permanent-lyrics')
soup = bs(r.content, 'lxml')
pprint([i for j in [[line for line in verse.stripped_strings] for verse in soup.select('[data-scrolltrigger-pin]')] for i in j])
# pprint('\n'.join([i for j in [[line for line in verse.stripped_strings] for verse in soup.select('[data-scrolltrigger-pin]')] for i in j]))
这是一个答案没有使用身份验证。首先,安装以下软件包:
pip install requests beautifulsoup4
以下代码使用硬编码的歌词页面:
import requests
from bs4 import BeautifulSoup
# URL of the song lyrics page
url = 'https://genius.com/Drake-gods-plan-lyrics'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
lyrics_div = soup.find('div', class_='Lyrics__Container-sc-1ynbvzw-1 kUgSbL')
lyrics = lyrics_div.get_text(strip=True) if lyrics_div else "Lyrics not found"
print(lyrics)
我通过检查 HTML 源代码获得了
Lyrics__Container-sc-1ynbvzw-1 kUgSbL
类。幸运的是,它在 URL 中具有相同的名称。
请注意,
url
的常见形式为 https://genius.com/<artist_name>-<song_name>-lyrics
。
输出如下:
[Intro]And they wishin' and wishin'And wishin' and wishin', they wishin' on meYeah[Verse 1]I been movin' calm, don't start no trouble with meTryna keep it peaceful is a struggle for meDon't pull up at 6 AM to cuddle with meYou know how I like it when you lovin' on meI don't wanna die for them to miss meYes, I see the things that they wishin' on meHope I got some brothers that outlive meThey gon' tell the story, shit was different with me[Chorus]God's plan, God's planI hold back, sometimes I won't, yeahI feel good, sometimes I don't (Ayy, don't)I finessed down Weston Road (Ayy, 'nessed)Might go down a G-O-D (Yeah, wait)I go hard on Southside G (Yeah, wait)I make sure that north-side eatAnd still[Post-Chorus]Bad thingsIt's a lot of bad things that they wishin' and wishin'And wishin' and wishin', they wishin' on meBad thingsIt's a lot of bad things that they wishin' and wishin'And wishin' and wishin', they wishin' on meYeah, ayy, ayy