lxml是一个功能齐全的高性能Python库,用于处理XML和HTML。
我正在使用Python 3.9。我有一个像这样的嵌套 xml 文档字符串 负载_xml =“”“ ... ... ... 我使用的是Python 3.9。我有一个像这样的嵌套 xml 文档字符串 payload_xml = """ <AllData> <MyPayload> ... ... ... </MyPayload> </AllData> """ 现在我想创建另一个父 xml 文档字符串,并将此有效负载提取到新创建的 xml 文档中,如下所示 新 XML <Full_Message prop1="" prop2=""> <Header> <headerValue1> </headerValue1> <headerValue2> </headerValue2> <headerValue3> </headerValue3> <NestedValues> <someval1> </someval> </NestedValues> </Header> <Body> <!--Insert MyPayload xml string here ignoring AllData node--> </Body> </Full_Message> 这是我目前所在的位置 from lxml import etree FullMessage_root = etree.Element("Full_Message") AllData_root = etree.fromstring(payload_xml) payload_only = AllData_root[0] FullMessage_root.append(payload_only) FullMessage_root.insert(0, etree.Element("Header")) FullMessage_root.insert(1, etree.Element("Body")) FullMessage_root.attrib['prop1']='hello world' 这会导致: <Full_Message prop1="hello world"> <Header/> <Body/> <MyPayload> </MyPayload> </Full_Message> 如何将 <MyPayload> 嵌套在 <Body> 标签中并在 <Header> 中创建多个嵌套值? 以下是实现您目标的一种方法。我们从上到下逐层创建新的 XML,并使用 append 将子元素附加到父元素。 import xml.etree.ElementTree as ET def create_elements(parent_ele, child_tags, child_vals): for tag, val in zip(child_tags, child_vals): ele = ET.Element(tag) if val: ele.text = val parent_ele.append(ele) payload_xml = ''' <AllData> <MyPayload> Foo </MyPayload> </AllData> ''' # Create root root = ET.Element('Full_Message') root.set('prop1', 'some prop') root.set('prop2', 'other prop') # Add elements to root create_elements( root, ['Header', 'Body'], [None] * 2, # no text value attached to header and body ) # Add elements to header create_elements( root.find('Header'), ['headerValue1', 'headerValue2', 'headerValue3', 'NestedValues'], ['val1', 'val2', 'val3', None], # note that no text value attached to NestedValues ) # Add elements to NestedValues create_elements( root.find('Header').find('NestedValues'), ['someval1', 'someval2', 'someval3'], ['nested val1', 'nested val2', 'nested val3'], ) # Insert payload AllData_root = ET.ElementTree(ET.fromstring(payload_xml)).getroot() root.find('Body').append(AllData_root) # print the new XML ET.indent(root) print(ET.tostring(root, encoding='unicode')) 输出将是 <Full_Message prop1="some prop" prop2="other prop"> <Header> <headerValue1>val1</headerValue1> <headerValue2>val2</headerValue2> <headerValue3>val3</headerValue3> <NestedValues> <someval1>nested val1</someval1> <someval2>nested val2</someval2> <someval3>nested val3</someval3> </NestedValues> </Header> <Body> <AllData> <MyPayload> Foo </MyPayload> </AllData> </Body> </Full_Message>
我无法解析这个 xliff 片段: 文字1 文字2 文字3 文字4 我想要一个迭代方法...
Python lxml 解析器不会返回 <p> 元素的整个文本(如果其中有 <xref>)继续教育活动
我正在尝试使用 lxml 从 .xml 格式文章中的所有 元素中提取文本。这是文章的示例: 继续教育活动 <... 我正在尝试使用 lxml 从 .xml 格式文章中的所有 <p> 元素中提取文本。这是文章的示例: <title>Continuing Education Activity</title> <p>Ebstein anomaly is a rare congenital heart disease that involves the apical displacement of the tricuspid valve with adherence of the septal and posterior leaflets to the myocardium and "atrialization of the inlet portion of the right ventricle". It is usually accompanied by tricuspid regurgitation, right ventricular failure, and arrhythmias. Clinical manifestations range from asymptomatic to severe, depending on the degree of tricuspid valve displacement and severity of regurgitation, the effective right ventricular volume, and the associated malformations (i.e., pulmonary valve stenosis, atresia, atrial septal defect, etc.). Arrhythmias are common and protracted due to the likelihood of having accessory pathways, in addition to having right atrial dilatation. Symptomatic patients can present with cyanosis, congestive heart failure, and arrhythmias, with exertional dyspnea being common in older patients. This activity reviews the pathophysiology and presentation of Ebstein's malformation and highlights the role of the interprofessional team in its management.</p> <p> <bold>Objectives:</bold> <list list-type="bullet"><list-item><p>Describe the pathophysiology of Ebstein anomaly.</p></list-item><list-item><p>Review the clinical presentation of a patient with an Ebstein anomaly.</p></list-item><list-item><p>Outline the approach to the management of patients with Ebstein anomaly, including the indications for non-surgical and surgical intervention.</p></list-item><list-item><p>Summarize the importance of improving care coordination among interprofessional team members to improve outcomes for patients affected by Ebstein anomaly.</p></list-item></list> <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://www.statpearls.com/account/trialuserreg/?articleid=20850&utm_source=pubmed&utm_campaign=reviews&utm_content=20850">Access free multiple choice questions on this topic.</ext-link> </p> </sec> <sec id="article-20850.s2" sec-type="pubmed-excerpt"> <title>Introduction</title> <p>Ebstein anomaly is a rare congenital abnormality involving the tricuspid valve and the right ventricle (RV),<xref ref-type="bibr" rid="article-20850.r1">[1]</xref> with an incidence of <1% of congenital heart defects.<xref ref-type="bibr" rid="article-20850.r2">[2]</xref> It was first described by the pathologist Wilhelm Ebstein in 1866 when he performed an autopsy of a 19-year-old cyanotic male who had suffered from exertional dyspnea and palpitations and died of a sudden cardiac arrest.<xref ref-type="bibr" rid="article-20850.r3">[3]</xref> Ebstein anomaly is defined by the following characteristics:</p> 注意最后一个 <p> 元素如何散布 <xref> 元素作为引文。当我使用以下Python代码提取文本时: import lxml def extract_text(filename): chunks = [] tree = etree.parse('./data/statpearls_NBK430685/' + filename) root = tree.getroot() p_tags = tree.findall('.//p') # list_tags = tree.findall('.//list') # whenever there's a list, include the para above as well as context. for p in p_tags: if p.text is None: continue elif not any(char.isalpha() for char in p.text): # check that there are some alphabetical characters and ignore if there aren't continue chunks.append(p.text) return chunks extract_text('article-20850.nxml') 这是输出: ['Ebstein anomaly is a rare congenital heart disease that involves the apical displacement of the tricuspid valve with adherence of the septal and posterior leaflets to the myocardium and "atrialization of the inlet portion of the right ventricle". It is usually accompanied by tricuspid regurgitation, right ventricular failure, and arrhythmias. Clinical manifestations range from asymptomatic to severe, depending on the degree of tricuspid valve displacement and severity of regurgitation, the effective right ventricular volume, and the associated malformations (i.e., pulmonary valve stenosis, atresia, atrial septal defect, etc.). Arrhythmias are common and protracted due to the likelihood of having accessory pathways, in addition to having right atrial dilatation. Symptomatic patients can present with cyanosis, congestive heart failure, and arrhythmias, with exertional dyspnea being common in older patients. This activity reviews the pathophysiology and presentation of Ebstein\'s malformation and highlights the role of the interprofessional team in its management.', 'Describe the pathophysiology of Ebstein anomaly.', 'Review the clinical presentation of a patient with an Ebstein anomaly.', 'Outline the approach to the management of patients with Ebstein anomaly, including the indications for non-surgical and surgical intervention.', 'Summarize the importance of improving care coordination among interprofessional team members to improve outcomes for patients affected by Ebstein anomaly.', 'Ebstein anomaly is a rare congenital abnormality involving the tricuspid valve and the right ventricle (RV),'] 最后一块完全丢失了 <xref> 标签之后的所有文本。有人知道是什么原因导致这种行为以及如何防止这种情况吗? 我建议使用beautifulsoup库来解析这个HTML/XML混合文件: from bs4 import BeautifulSoup text = """\ <title>Continuing Education Activity</title> <p>Ebstein anomaly is a rare congenital heart disease that involves the apical displacement of the tricuspid valve with adherence of the septal and posterior leaflets to the myocardium and "atrialization of the inlet portion of the right ventricle". It is usually accompanied by tricuspid regurgitation, right ventricular failure, and arrhythmias. Clinical manifestations range from asymptomatic to severe, depending on the degree of tricuspid valve displacement and severity of regurgitation, the effective right ventricular volume, and the associated malformations (i.e., pulmonary valve stenosis, atresia, atrial septal defect, etc.). Arrhythmias are common and protracted due to the likelihood of having accessory pathways, in addition to having right atrial dilatation. Symptomatic patients can present with cyanosis, congestive heart failure, and arrhythmias, with exertional dyspnea being common in older patients. This activity reviews the pathophysiology and presentation of Ebstein's malformation and highlights the role of the interprofessional team in its management.</p> <p> <bold>Objectives:</bold> <list list-type="bullet"><list-item><p>Describe the pathophysiology of Ebstein anomaly.</p></list-item><list-item><p>Review the clinical presentation of a patient with an Ebstein anomaly.</p></list-item><list-item><p>Outline the approach to the management of patients with Ebstein anomaly, including the indications for non-surgical and surgical intervention.</p></list-item><list-item><p>Summarize the importance of improving care coordination among interprofessional team members to improve outcomes for patients affected by Ebstein anomaly.</p></list-item></list> <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://www.statpearls.com/account/trialuserreg/?articleid=20850&utm_source=pubmed&utm_campaign=reviews&utm_content=20850">Access free multiple choice questions on this topic.</ext-link> </p> </sec> <sec id="article-20850.s2" sec-type="pubmed-excerpt"> <title>Introduction</title> <p>Ebstein anomaly is a rare congenital abnormality involving the tricuspid valve and the right ventricle (RV),<xref ref-type="bibr" rid="article-20850.r1">[1]</xref> with an incidence of <1% of congenital heart defects.<xref ref-type="bibr" rid="article-20850.r2">[2]</xref> It was first described by the pathologist Wilhelm Ebstein in 1866 when he performed an autopsy of a 19-year-old cyanotic male who had suffered from exertional dyspnea and palpitations and died of a sudden cardiac arrest.<xref ref-type="bibr" rid="article-20850.r3">[3]</xref> Ebstein anomaly is defined by the following characteristics:</p> """ soup = BeautifulSoup(text, "html.parser") # remove <xref> to not appear in text for xref in soup.select("xref"): xref.extract() for p in soup.select("p"): print(p.get_text(strip=True, separator=" ")) print("-" * 80) 打印: Ebstein anomaly is a rare congenital heart disease that involves the apical displacement of the tricuspid valve with adherence of the septal and posterior leaflets to the myocardium and "atrialization of the inlet portion of the right ventricle". It is usually accompanied by tricuspid regurgitation, right ventricular failure, and arrhythmias. Clinical manifestations range from asymptomatic to severe, depending on the degree of tricuspid valve displacement and severity of regurgitation, the effective right ventricular volume, and the associated malformations (i.e., pulmonary valve stenosis, atresia, atrial septal defect, etc.). Arrhythmias are common and protracted due to the likelihood of having accessory pathways, in addition to having right atrial dilatation. Symptomatic patients can present with cyanosis, congestive heart failure, and arrhythmias, with exertional dyspnea being common in older patients. This activity reviews the pathophysiology and presentation of Ebstein's malformation and highlights the role of the interprofessional team in its management. -------------------------------------------------------------------------------- Objectives: Describe the pathophysiology of Ebstein anomaly. Review the clinical presentation of a patient with an Ebstein anomaly. Outline the approach to the management of patients with Ebstein anomaly, including the indications for non-surgical and surgical intervention. Summarize the importance of improving care coordination among interprofessional team members to improve outcomes for patients affected by Ebstein anomaly. Access free multiple choice questions on this topic. -------------------------------------------------------------------------------- Describe the pathophysiology of Ebstein anomaly. -------------------------------------------------------------------------------- Review the clinical presentation of a patient with an Ebstein anomaly. -------------------------------------------------------------------------------- Outline the approach to the management of patients with Ebstein anomaly, including the indications for non-surgical and surgical intervention. -------------------------------------------------------------------------------- Summarize the importance of improving care coordination among interprofessional team members to improve outcomes for patients affected by Ebstein anomaly. -------------------------------------------------------------------------------- Ebstein anomaly is a rare congenital abnormality involving the tricuspid valve and the right ventricle (RV), with an incidence of <1% of congenital heart defects. It was first described by the pathologist Wilhelm Ebstein in 1866 when he performed an autopsy of a 19-year-old cyanotic male who had suffered from exertional dyspnea and palpitations and died of a sudden cardiac arrest. Ebstein anomaly is defined by the following characteristics: --------------------------------------------------------------------------------
为什么我在安装了库的 Azure DevOps 管道中收到“ModuleNotFoundError:没有名为 'lxml' 的模块”错误
我正在尝试运行一个简单的 python 脚本,该脚本在本地运行良好,但在 DevOps 管道中继续遇到相同的错误。我已将库的安装包含在 yaml 文件中并在
我正在尝试处理一堆 XML 文件,并在满足某些条件时向特定元素添加某些属性。我有相同 XML 文档的不同版本。其中一些有...
给定示例country.xml 文件,我希望将每个国家/地区复制到新的output.xml 文件,作为新根的子元素。问题是当我附加每个国家/地区时,我会得到重复的
我想使用 PEP 634 – 结构模式匹配来匹配具有特定属性的 HtmlElement。这些属性可通过 .attrib 属性访问,该属性返回
Mac OS X 10.9 上的 Python3、lxml 和“未找到符号:_lzma_auto_decoder”
我使用homebrew安装了python 3,然后安装了pip3和lxml。 以下行 从 lxml 导入主菜 导致以下错误: $ python3 Python...
AWS Lambda Python 3.11:无法导入 lxml:libxslt.so.1:无法打开共享对象文件:没有这样的文件或目录
我在 AWS Lambda 上有一个依赖于 lxml 的 Python 函数。依赖层包含诗歌安装lxml的结果,但我在运行时收到以下错误: “错误消息”:&
我是第一次使用 Salesforce SOAP API,所以我不熟悉 SOAP 格式问题等。我使用 lxml 库生成 XML,但似乎有格式问题...
我正在尝试安装Scrapy;但是,这是我遇到的错误: Failed Building Wheel for lxml 。请帮忙
遇到错误 lxml 构建轮子失败 src/lxml/etree.c:96:10:致命错误:找不到“Python.h”文件 #include“Python.h” ^~~~~~~~~~ 生成 1 个错误。 错误:无法构建...
使用 lxml 和 django/python - 列表索引超出范围
我有一个小问题。我正在尝试使用 lxml 从 XML 中提取一些数据,但一直收到“列表索引超出范围”错误,现在我正在尝试获取列表的 [0] 位置,这应该...
如何修复:引发 ImportError("lxml 未找到,请安装它")
我目前在 Pythonanywhere 上托管我的 python Flask 应用程序。 当我运行我的抓取脚本时,它使用代码 df = pd.read_html(当前数据.内容) 我收到标题中发现的错误。 跑步...
<?xml version=“1.0” encoding=“UTF-8”?> 不是<?xml version='1.0' encoding='UTF-8'?>
我正在使用 lxml tree.write(xmlFileOut, Pretty_print = True, xml_declaration = True, 编码='UTF-8' 写出我打开和编辑的 xml 文件,但我绝对需要 xml 声明...
我正在使用Python中的lxml中的XPath在HTML文档中进行搜索。如何获取某个元素的路径?这是 ruby nokogiri 的示例: page.xpath('//text()').each 做 |textnode| ...
我一直在尝试在Cygwin上使用pip install安装Python3.8下的cython和lxml包。然而,这会反复失败,并出现从 python 错误到 gcc 错误等难以理解的错误
如何在迭代编写时强制缩进 python LXML xml 元素嵌套?
我正在使用 LXML 编写一个 xml 文件,该文件是数据库的转储。 鉴于数据的大小,我必须反复编写 xml 文件。将 etree 转储到文件时,服务器内存不足
进口请求 从 bs4 导入 BeautifulSoup 将熊猫导入为 pd headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari...