使用bs4从href中提取文本的一部分

Question

想要从href中提取文本，看起来我只能从HTML中提取整个href

from bs4 import BeautifulSoup

soup=BeautifulSoup("""<div class="cdAllIn"><a href="/footba/all.aspx?lang=EN&amp;tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0" title="All Odds"><img src="/football/info/images/btn_odds.gif?CV=L302R1g" alt="All Odds" title="All Odds"></a></div>
<div class="cdAllIn"><a href="/footba/all.aspx?lang=EN&amp;tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0" title="All Odds"><img src="/football/info/images/btn_odds.gif?CV=L302R1g" alt="All Odds" title="All Odds"></a></div>
<div class="cdAllIn"><a href="/footba/all.aspx?lang=EN&amp;tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0" title="All Odds"><img src="/football/info/images/btn_odds.gif?CV=L302R1g" alt="All Odds" title="All Odds"></a></div>
<div class="cdAllIn"><a href="/footba/all.aspx?lang=EN&amp;tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0" title="All Odds"><img src="/football/info/images/btn_odds.gif?CV=L302R1g" alt="All Odds" title="All Odds"></a></div>
""",'html.parser')

lines=soup.find_all('a')
for line in lines:
    print(line['href'])

结果：

/footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0
/footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0
/footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0
/footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0

预期结果：

6be0690b-93e3-4300-87e9-7d0aa5797ae0
6be0690b-93e3-4300-87e9-7d0aa5797ae0
6be0690b-93e3-4300-87e9-7d0aa5797ae0
6be0690b-93e3-4300-87e9-7d0aa5797ae0

Answer 1

使用=拆分字符串并获取最后一个索引。

for line in lines:
    print(line['href'].split('=')[-1])

希望这可以帮助！干杯!

Answer 2

由于您只需要检索tmatchid值，因此在url中找到substring tmatchid =并从该索引中提取剩余的url

lines=soup.find_all('a')
for line in lines:
    index=line['href'].find('tmatchid=')+9
    print(line['href'][index:])

产量

6 B 0690 B-93 Y-4300-87 Y-D 0 0 0 0 6 B 0690 B-93 Y-4300-87 Y-D 0 0 0 0 6 B 0690 B-93 Y-4300-87 Y-D 0 0 0 0 6 B 0690 B-93 Y-4300-87 Y-D 0 0 0 0

使用bs4从href中提取文本的一部分

问题描述投票：0回答：2

2个回答

最新问题

使用bs4从href中提取文本的一部分

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2