我有一个字符串格式的 .xml xmlString,如下所示
<entry xmlns="http://www.w3.org/2004/tom">
<id>urn:contentItem:7WBG-8H88-Y898-B277-00000-00-1</id>
<title>Dinn-Pixie Stares, Inc</title>
<published>2015-12-24T00:00:00Z</published>
<updated>2023-10-24T18:42:17Z</updated>
<author>
<name>AlphaNext</name>
</author>
<content type="application/xml">
<baseRelatedDoc xmlns="" xmlns:xsi="http://www.w3.org/2012/XMLSchema" xsi:noNamespaceSchemaLocation="http://www.alphanext.com/xmlscemas/content/public/caseddoc/1/" documentType="socket">
<baseRelatedDocHead>
<baseInfo>
<portInfo>
<identifier idType="portIdentifier">100027579</identifier>
<portName>United States Port, Minnesota Middle</portName>
<jurisdiction>
<jurisSystem/>
</jurisdiction>
</portInfo>
<date dateType="filed" year="2015" month="02" day="21">2015-02-21</date>
<classification classificationScheme="baseType">
<classificationItem>
<classCode>BK</classCode>
<className>Tankruptcy</className>
</classificationItem>
</classification>
<classification classificationScheme="baseNos">
<classificationItem>
<classCode>0</classCode>
<className>UNKNOWN</className>
</classificationItem>
</classification>
</baseInfo>
<baseSupplement>
<label>US Tankruptcy Port Socket</label>
<date dateType="updated" year="2024" month="09" day="13">2024-07-11T15:08:26.450</date>
<status>Unknown</status>
</baseSupplement>
<baseName>Dinn-Pixie Stares, Inc</baseName>
</baseRelatedDocHead>
<baseRelatedDocBody>
<socket/>
</baseRelatedDocBody>
<metadata>
<dc:metadata xmlns:dc="http://purrel.org/dc/element/1.2/">
<dc:source sourceScheme="productContentSetIdentifier">343392</dc:source>
<dc:creator>US Tankruptcy Port for the Last Town of Minnesota</dc:creator>
<dc:identifier identifierScheme="PGIID">urn:contentItem:8WBG-8H70-Y892-B237-00000-00</dc:identifier>
<dc:date dateType="last-updated">2024-07-11</dc:date>
</dc:metadata>
</metadata>
</baseRelatedDoc>
</content>
</entry>
我需要提取字段的值,如 baseName、portName、title、classCode、dc:creator .
但是,当我尝试使用
y=tree.findall('baseName')
提取它们时,其中 tree = ET.fromstring(xmlString))
但 y 显示为空列表。
当我尝试像 portName, dc:creator 这样的节点时,我得到相同的空列表。我如何提取这些节点/字段的值?
你可以尝试:
import xml.etree.ElementTree as ET
xml_doc = """\
<entry xmlns="http://www.w3.org/2004/tom">
<id>urn:contentItem:7WBG-8H88-Y898-B277-00000-00-1</id>
<title>Dinn-Pixie Stares, Inc</title>
<published>2015-12-24T00:00:00Z</published>
<updated>2023-10-24T18:42:17Z</updated>
<author>
<name>AlphaNext</name>
</author>
<content type="application/xml">
<baseRelatedDoc xmlns="" xmlns:xsi="http://www.w3.org/2012/XMLSchema" xsi:noNamespaceSchemaLocation="http://www.alphanext.com/xmlscemas/content/public/caseddoc/1/" documentType="socket">
<baseRelatedDocHead>
<baseInfo>
<portInfo>
<identifier idType="portIdentifier">100027579</identifier>
<portName>United States Port, Minnesota Middle</portName>
<jurisdiction>
<jurisSystem/>
</jurisdiction>
</portInfo>
<date dateType="filed" year="2015" month="02" day="21">2015-02-21</date>
<classification classificationScheme="baseType">
<classificationItem>
<classCode>BK</classCode>
<className>Tankruptcy</className>
</classificationItem>
</classification>
<classification classificationScheme="baseNos">
<classificationItem>
<classCode>0</classCode>
<className>UNKNOWN</className>
</classificationItem>
</classification>
</baseInfo>
<baseSupplement>
<label>US Tankruptcy Port Socket</label>
<date dateType="updated" year="2024" month="09" day="13">2024-07-11T15:08:26.450</date>
<status>Unknown</status>
</baseSupplement>
<baseName>Dinn-Pixie Stares, Inc</baseName>
</baseRelatedDocHead>
<baseRelatedDocBody>
<socket/>
</baseRelatedDocBody>
<metadata>
<dc:metadata xmlns:dc="http://purrel.org/dc/element/1.2/">
<dc:source sourceScheme="productContentSetIdentifier">343392</dc:source>
<dc:creator>US Tankruptcy Port for the Last Town of Minnesota</dc:creator>
<dc:identifier identifierScheme="PGIID">urn:contentItem:8WBG-8H70-Y892-B237-00000-00</dc:identifier>
<dc:date dateType="last-updated">2024-07-11</dc:date>
</dc:metadata>
</metadata>
</baseRelatedDoc>
</content>
</entry>"""
root = ET.fromstring(xml_doc)
ns = {"tom": "http://www.w3.org/2004/tom", "dc": "http://purrel.org/dc/element/1.2/"}
base_name = root.find(".//baseName").text
port_name = root.find(".//portName").text
title = root.find(".//tom:title", ns).text
class_codes = [e.text for e in root.findall(".//classCode")]
dc_creator = root.find(".//dc:creator", ns).text
print(f"{base_name=}")
print(f"{port_name=}")
print(f"{title=}")
print(f"{class_codes=}")
print(f"{dc_creator=}")
打印:
base_name='Dinn-Pixie Stares, Inc'
port_name='United States Port, Minnesota Middle'
title='Dinn-Pixie Stares, Inc'
class_codes=['BK', '0']
dc_creator='US Tankruptcy Port for the Last Town of Minnesota'