lxml获取包含子节点和文本的标记的全部内容

问题描述 投票:0回答:1

我想从下面的XML中获取所有文本内容以及标签

<title-group><article-title xml:lang="en">Correction to: Effective adsorptive performance of Fe<sub>3</sub>O<sub>4</sub>@SiO<sub>2</sub>core shell spheres for methylene blue: kinetics, isotherm and mechanism</article-title></title-group>

上面的输出应该是

更正:的有效吸附性能Fe [3 O 4 @ SiO 2核壳球亚甲蓝:动力学,等温线和机理

我尝试了以下操作,但它给我的价值不完整

s= '<title-group><article-title xml:lang="en">Correction to: Effective adsorptive performance of Fe<sub>3</sub>O<sub>4</sub>@SiO<sub>2</sub>core shell spheres for methylene blue: kinetics, isotherm and mechanism</article-title></title-group>' d = etree.fromstring(s) title_xpath = '/title-group/article-title' title = "" if not d.xpath(title_xpath)[0].getchildren(): title = d.xpath(title_xpath)[0].text else: for title_elem in d.xpath(title_xpath): title_parts = title_elem.getchildren() title = ''.join(etree.tostring(part, encoding="unicode") for part in title_parts) print(title)

上面的代码给了我

3

O 4 @ SiO 2的核壳球亚甲蓝:动力学,等温线和机理
python python-3.x xml-parsing lxml
1个回答
© www.soinside.com 2019 - 2024. All rights reserved.