如何获取源字符串中标签的开始和结束位置？

Question

如何使用 BeautifulSoup 解析 html 字符串并找到 HTML 标签的开始和结束索引？例如

get_start_stop('hello<br    >there', 'br')

应该返回

(5, 13)

我看过了

def get_start_stop(source, tag_name):
    soup = BeautifulSoup(source, 'html.parser')
    return dir(soup.find(tag_name))

但是我所希望的事情，

sourcepos

，

string

，

strings

，

self_and_descendants

，

.nextSibling.sourcepos

没有获得开始和结束索引所需的信息（据我所知）源字符串中的标签。

Answer 1

我认为BeautifulSoup并没有直接提供HTML标签的开始和结束索引，但是你可以通过在原始字符串中定位标签来找到它们

def get_start_stop(source, tag_name):
    soup = BeautifulSoup(source, 'html.parser')
    tag = str(soup.find(tag_name))
    start = source.find(tag)
    return start, start + len(tag) if start != -1 else None

如何获取源字符串中标签的开始和结束位置？

问题描述投票：0回答：1

1个回答

最新问题

如何获取源字符串中标签的开始和结束位置？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1