删除元素，但不删除其后面的文本

Question

我有一个与此类似的

XML

文件：

<root>
<a>Some <b>bad</b> text <i>that</i> I <u>do <i>not</i></u> want to keep.</a>
</root>

我想删除

<b>

或

<u>

元素（和后代）中的所有文本，并打印其余部分。这是我尝试过的：

from __future__ import print_function
import xml.etree.ElementTree as ET

tree = ET.parse('a.xml')
root = tree.getroot()

parent_map = {c:p for p in root.iter() for c in p}

for item in root.findall('.//b'):
  parent_map[item].remove(item)
for item in root.findall('.//u'):
  parent_map[item].remove(item)
print(''.join(root.itertext()).strip())

（我使用这个答案中的食谱来构建

parent_map

）。当然，问题是，对于

remove(item)

，我还删除了元素后面的文本，结果是：

Some that I

而我想要的是：

Some  text that I  want to keep.

有什么解决办法吗？

Answer 1

如果您最终不会使用更好的东西，您可以使用

clear()

而不是

remove()

保留元素的尾部：

import xml.etree.ElementTree as ET


data = """<root>
<a>Some <b>bad</b> text <i>that</i> I <u>do <i>not</i></u> want to keep.</a>
</root>"""

tree = ET.fromstring(data)
a = tree.find('a')
for element in a:
    if element.tag in ('b', 'u'):
        tail = element.tail
        element.clear()
        element.tail = tail

print ET.tostring(tree)

打印（参见空的

和

标签）：

<root>
<a>Some <b /> text <i>that</i> I <u /> want to keep.</a>
</root>

另外，这里有一个使用

xml.dom.minodom

的解决方案：

import xml.dom.minidom

data = """<root>
<a>Some <b>bad</b> text <i>that</i> I <u>do <i>not</i></u> want to keep.</a>
</root>"""

dom = xml.dom.minidom.parseString(data)
a = dom.getElementsByTagName('a')[0]
for child in a.childNodes:
    if getattr(child, 'tagName', '') in ('u', 'b'):
        a.removeChild(child)

print dom.toxml()

打印：

<?xml version="1.0" ?><root>
<a>Some  text <i>that</i> I  want to keep.</a>
</root>

Answer 2

我最终得到了以下算法，该算法删除子元素但保留周围的文本（

element

是感兴趣的ET.Element）：

last = None
for c in list(element):
    if your_condition:
        if c.tail:
            if last is None:
                element.text = (element.text or '') + c.tail
            else:
                last.tail = (last.tail or '') + c.tail
        element.remove(c)

删除元素，但不删除其后面的文本

问题描述投票：0回答：2

2个回答

最新问题

删除元素，但不删除其后面的文本

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2