使用 BeautifulSoup 提取标签内的内容

问题描述 投票:0回答:4
python beautifulsoup
4个回答
52
投票

contents
运算符非常适合从
text
中提取
<tag>text</tag>


<td>My home address</td>
示例:

s = '<td>My home address</td>'
soup =  BeautifulSoup(s)
td = soup.find('td') #<td>My home address</td>
td.contents #My home address

<td><b>Address:</b></td>
示例:

s = '<td><b>Address:</b></td>'
soup =  BeautifulSoup(s)
td = soup.find('td').find('b') #<b>Address:</b>
td.contents #Address:

21
投票

使用

.next
代替:

>>> s = '<table border="0" cellspacing="2" width="800"><tr><td colspan="2"><b>Name: </b>Hello world</td></tr><tr>'
>>> soup = BeautifulSoup(s)
>>> hello = soup.find(text='Name: ')
>>> hello.next
u'Hello world'

.next
.previous
允许您按照解析器处理文档元素的顺序移动文档元素,而同级方法则使用解析树。


6
投票

使用下面的代码使用 python beautifulSoup 从 html 标签中提取文本和内容

s = '<td>Example information</td>' # your raw html
soup =  BeautifulSoup(s) #parse html with BeautifulSoup
td = soup.find('td') #tag of interest <td>Example information</td>
td.text #Example information # clean text from html

3
投票
from bs4 import BeautifulSoup, Tag

def get_tag_html(tag: Tag):
    return ''.join([i.decode() if type(i) is Tag else i for i in tag.contents])
© www.soinside.com 2019 - 2024. All rights reserved.