使用Python进行Web抓取时从链接中拉出href

Question

我正在从这个页面上刮：https://www.pro-football-reference.com/years/2018/week_1.htm

这是美式足球的比赛列表。我想打开第一场比赛的统计数据链接。显示的文字说“最终”。我的代码到目前为止......

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup


#assigning url
my_url = "https://www.pro-football-reference.com/years/2018/week_1.htm"

# opening up connection, grabbing the page
raw_page = uReq(my_url)
page_html = raw_page.read()
raw_page.close()

# html parsing
page_soup = soup(page_html,"html.parser")

#find all games on page
games = page_soup.findAll("div",{"class":"game_summary expanded nohover"})

link = games[0].find("td",{"class":"right gamelink"})
print(link)

当我运行这个时，我收到以下输出...

<a href="/boxscores/201809060phi.htm">Final</a>

如何仅将链接文本（即“/boxscores/201809060phi.htm”）分配给变量？

Answer 1

link = games[0].find("td",{"class":"right gamelink"}).find('a')

print(link['href'])

使用Python进行Web抓取时从链接中拉出href

问题描述投票：0回答：1

1个回答

最新问题

使用Python进行Web抓取时从链接中拉出href

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1