lxml获取包含子节点和文本的标记的全部内容

Question

我想从下面的XML中获取所有文本内容以及标签

<title-group><article-title xml:lang="en">Correction to: Effective adsorptive performance of Fe<sub>3</sub>O<sub>4</sub>@SiO<sub>2</sub>core shell spheres for methylene blue: kinetics, isotherm and mechanism</article-title></title-group>

上面的输出应该是

更正：的有效吸附性能Fe [3 O ₄ @ SiO ₂核壳球亚甲蓝：动力学，等温线和机理

_{我尝试了以下操作，但它给我的价值不完整 s= '<title-group><article-title xml:lang="en">Correction to: Effective adsorptive performance of Fe<sub>3</sub>O<sub>4</sub>@SiO<sub>2</sub>core shell spheres for methylene blue: kinetics, isotherm and mechanism</article-title></title-group>'
d = etree.fromstring(s)
title_xpath = '/title-group/article-title'
title = ""
if not d.xpath(title_xpath)[0].getchildren():
title = d.xpath(title_xpath)[0].text
else:
for title_elem in d.xpath(title_xpath):
title_parts = title_elem.getchildren()
title = ''.join(etree.tostring(part, encoding="unicode") for part in title_parts)
print(title)
上面的代码给了我
3} O ₄ @ SiO ₂的核壳球亚甲蓝：动力学，等温线和机理

lxml获取包含子节点和文本的标记的全部内容

问题描述投票：0回答：1

1个回答

最新问题

lxml获取包含子节点和文本的标记的全部内容

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1