访问用ElementTree解析的xml文件中的嵌套子级

问题描述 投票:11回答:2

我是xml解析的新手。 This xml file具有以下树:

FHRSEstablishment
 |--> Header
 |    |--> ...
 |--> EstablishmentCollection
 |    |--> EstablishmentDetail
 |    |    |-->...
 |    |--> Scores
 |    |    |-->...
 |--> EstablishmentCollection
 |    |--> EstablishmentDetail
 |    |    |-->...
 |    |--> Scores
 |    |    |-->...

但是当我使用ElementTree访问它并寻找child标签和属性时,

import xml.etree.ElementTree as ET
import urllib2
tree = ET.parse(
   file=urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml' % i))
root = tree.getroot()
for child in root:
   print child.tag, child.attrib

我只得到:

Header {}
EstablishmentCollection {}

我假设这意味着它们的属性为空。为什么会这样,如何访问嵌套在EstablishmentDetailScores中的子级?

编辑

由于下面的答案,我可以进入树的内部,但是如果我想检索诸如Scores中的值,将失败:

for node in root.find('.//EstablishmentDetail/Scores'):
    rating = node.attrib.get('Hygiene')
    print rating 

并产生

None
None
None

为什么?

python xml tree xml-parsing elementtree
2个回答
13
投票

您必须遍历您的根目录。

root.iter()将解决问题!

import xml.etree.ElementTree as ET
import urllib2
tree =ET.parse(urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml'))
root = tree.getroot()
for child in root.iter():
   print child.tag, child.attrib

输出:

FHRSEstablishment {}
Header {}
ExtractDate {}
ItemCount {}
ReturnCode {}
EstablishmentCollection {}
EstablishmentDetail {}
FHRSID {}
LocalAuthorityBusinessID {}
...
  • 要在EstablishmentDetail中获取所有标签,您需要找到该标签,然后遍历其子标签!

例如,]。>

for child in root.find('.//EstablishmentDetail'):
    print child.tag, child.attrib

输出:

FHRSID {}
LocalAuthorityBusinessID {}
BusinessName {}
BusinessType {}
BusinessTypeID {}
RatingValue {}
RatingKey {}
RatingDate {}
LocalAuthorityCode {}
LocalAuthorityName {}
LocalAuthorityWebSite {}
LocalAuthorityEmailAddress {}
Scores {}
SchemeType {}
NewRatingPending {}
Geocode {}
  • 要获得您在评论中提到的Hygiene的分数,
  • 您要做的是,它将获得第一个Scores标签,当您调用for each in root.find('.//Scores'):rating=child.get('Hygiene')时,它将带有Hygiene,ConfidenceInManagement,Structured标签作为子标签。也就是说,显然所有三个孩子都将没有元素!

您需要先-查找所有Scores标签。-在找到的每个标签中找到Hygiene

for each in root.findall('.//Scores'):
    rating = each.find('.//Hygiene')
    print '' if rating is None else rating.text

输出:

5
5
5
0
5

1
投票

希望它可能有用:

© www.soinside.com 2019 - 2024. All rights reserved.