使用BS在para中搜索链接

问题描述 投票:-1回答:3
<p class="graytext">2012 Transcripts</p>
<blockquote><p><a title="October 3, 2012 Debate Transcript" href="/voter-education/debate-transcripts/october-3-2012-debate-transcript/">October 3, 2012: The First Obama-Romney Presidential Debate</a></p>
<p><a href="/voter-education/debate-transcripts/october-11-2012-the-biden-romney-vice-presidential-debate/">October 11, 2012: The Biden-Ryan Vice Presidential Debate</a></p>
<p><a href="/voter-education/debate-transcripts/october-16-2012-the-second-obama-romney-presidential-debate/">October 16, 2012: The Second Obama-Romney Presidential Debate</a></p>
<p><a href="/voter-education/debate-transcripts/october-22-2012-the-third-obama-romney-presidential-debate/">October 22, 2012: The Third Obama-Romney Presidential Debate</a></p></blockquote>
<hr />
<p class="graytext">2008 Transcripts</p>
<blockquote><p><a title="September 26, 2008 Debate Transcript" href="/voter-education/debate-transcripts/2008-debate-transcript/">September 26, 2008: The First McCain-Obama Presidential Debate</a></p>
<p><a title="October 2, 2008 Debate Transcript" href="/voter-education/debate-transcripts/2008-debate-transcript-2/">October 2, 2008: The Biden-Palin Vice Presidential Debate</a></p>
<p><a title="October 7, 2008 Debate Transcript" href="/voter-education/debate-transcripts/october-7-2008-debate-transcrip/">October 7, 2008: The Second McCain-Obama Presidential Debate</a></p>
<p><a title="October 15, 2008 Debate Transcript" href="/voter-education/debate-transcripts/october-15-2008-debate-transcript/">October 15, 2008: The Third McCain-Obama Presidential Debate</a></p></blockquote>
<hr />
<p class="graytext">2004 Transcripts</p>
<blockquote><p><a title="October 13, 2004 Debate Transcript" href="/voter-education/debate-transcripts/october-13-2004-debate-transcript/">October 13, 2004: The Third Bush-Kerry Presidential Debate</a></p>
<p><a title="October 8, 2004 Debate Transcript" href="/voter-education/debate-transcripts/october-8-2004-debate-transcript/">October 8, 2004: The Second Bush-Kerry Presidential Debate</a></p>
<p><a title="October 5, 2004 Transcript" href="/voter-education/debate-transcripts/october-5-2004-transcript/">October 5, 2004: The Cheney-Edwards Vice Presidential Debate</a></p>
<p><a title="September 30. 2004 Debate Transcript" href="/voter-education/debate-transcripts/september-30-2004-debate-transcript/">September 30, 2004: The First Bush-Kerry Presidential Debate</a></p></blockquote>
<hr />
<p class="graytext">2000 Transcripts</p>
<blockquote><p><a title="October 3, 2000 Transcript" href="/voter-education/debate-transcripts/october-3-2000-transcript/">October 3, 2000: The First Gore-Bush Presidential Debate</a></p>
<p><a title="October 5, 2000 Debate Transcript" href="/voter-education/debate-transcripts/october-5-2000-debate-transcript/">October 5, 2000: The Lieberman-Cheney Vice Presidential Debate</a></p>
<p><a title="October 11, 2000 Debate Transcript" href="/voter-education/debate-transcripts/october-11-2000-debate-transcript/">October 11, 2000: The Second Gore-Bush Presidential Debate</a></p>
<p><a title="October 17, 2000 Debate Transcript" href="/voter-education/debate-transcripts/october-17-2000-debate-transcript/">October 17, 2000: The Third Gore-Bush Presidential Debate</a></p>
<p><a title="Debate Transcript Translations" href="/voter-education/debate-transcripts/2000-debate-transcripts-translations/">The 2000 Debate Transcripts: Transcripts of the debates translated into six languages</a></p></blockquote>
<hr />

问题是要抓住与2008年和2004年第一次总统辩论有关的联系,所以答案是2008年和2004年成绩单块中的第一个环节,但我该怎么做呢?

python web-scraping beautifulsoup
3个回答
1
投票

导入美丽的soap依赖项。

from bs4 import BeautifulSoup
import re

page = open(html_doc)
soup = BeautifulSoup(page.read())

blockquote = soup.find_all('blockquote')

for anchor in blockquote:
    if  '2004' in anchor.a['href'] or '2008' in anchor.a['href'] :
            print(anchor.a['href'])

0
投票

你可以找到带有graytext类的q标签和文本2004|2008并使用find_next('a')来获取那些p标签之后的第一个链接

from bs4 import BeautifulSoup
import re
soup=BeautifulSoup(html,'html.parser')
wanted_p=soup.find_all('p',class_='graytext',text=re.compile('2008|2004'))
for p in wanted_p:
    print(p.find_next('a'))

产量

<a href="/voter-education/debate-transcripts/2008-debate-transcript/" title="September 26, 2008 Debate Transcript">September 26, 2008: The First McCain-Obama Presidential Debate</a>
<a href="/voter-education/debate-transcripts/october-13-2004-debate-transcript/" title="October 13, 2004 Debate Transcript">October 13, 2004: The Third Bush-Kerry Presidential Debate</a>

0
投票

如果你知道你想要的年份,你可以使用attribute = value选择器来使用select_one来定位适当的href。 select_one返回第一场比赛。

debate2008 = soup.select_one("[href*='2008-debate-transcript']").text
debate2004= soup.select_one("[href*='2004-debate-transcript']").text
© www.soinside.com 2019 - 2024. All rights reserved.