BeautifulSoup：在标签之间创建并插入自闭合标签

Question

我正在解析html文件，并使用新标签替换特定链接。

Python代码：

from bs4 import BeautifulSoup
sample='''<a href="{Image src='https://google.com' link='https://google.com'}" >{Image src='https://google.com' link='google.com'}</a>'''
soup=BeautifulSoup(sample)
for a in soup.findAll('a'):
    x=BeautifulSoup('<ac:image><ri:attachment ri:filename="somefile"/> </ac:image>')
    a=a.replace_with(x)

print(soup)

实际输出：

<ac:image><ri:attachment ri:filename="somefile"></ri:attachment> </ac:image>

所需的输出：

<ac:image><ri:attachment ri:filename="somefile" /></ac:image>

自动结束标签会自动转换。目的地严格需要自闭标签。

任何帮助将不胜感激！

Answer 1

要获得正确的自动关闭标签，在创建要替换旧标签的新汤时使用解析器xml。>

此外，要保留ac和ri命名空间，xml解析器需要定义xmlns:ac和xmlns:ri参数。我们在处理后删除的虚拟标签中定义这些参数。

例如：

txt = '''
<div class="my-class">
    <a src="some address">
        <img src="attlasian_logo.gif" />
    </a>
</div>
<div class="my-class">
    <a src="some address2">
        <img src="other_logo.gif" />
    </a>
</div>
'''

template = '''
<div class="_remove_me" xmlns:ac="http://namespace1/" xmlns:ri="http://namespace2/">
<ac:image>
  <ri:attachment ri:filename="{img_src}" />
</ac:image>
</div>
'''

soup = BeautifulSoup(txt, 'html.parser')

for a in soup.select('a'):
    a.replace_with(BeautifulSoup(template.format(img_src=a.img['src']), 'xml'))  # <-- select `xml` parser, the template needs to have xmlns:* parameters to preserve namespaces

for div in soup.select('div._remove_me'):
    div.unwrap()

print(soup.prettify())
打印：

<div class="my-class">
 <ac:image>
  <ri:attachment ri:filename="attlasian_logo.gif"/>
 </ac:image>
</div>
<div class="my-class">
 <ac:image>
  <ri:attachment ri:filename="other_logo.gif"/>
 </ac:image>
</div>

BeautifulSoup：在标签之间创建并插入自闭合标签

问题描述投票：0回答：1

1个回答

最新问题

BeautifulSoup：在标签之间创建并插入自闭合标签

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1