我是xml解析的新手。 This xml file具有以下树:
FHRSEstablishment
|--> Header
| |--> ...
|--> EstablishmentCollection
| |--> EstablishmentDetail
| | |-->...
| |--> Scores
| | |-->...
|--> EstablishmentCollection
| |--> EstablishmentDetail
| | |-->...
| |--> Scores
| | |-->...
但是当我使用ElementTree访问它并寻找child
标签和属性时,
import xml.etree.ElementTree as ET
import urllib2
tree = ET.parse(
file=urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml' % i))
root = tree.getroot()
for child in root:
print child.tag, child.attrib
我只得到:
Header {}
EstablishmentCollection {}
我假设这意味着它们的属性为空。为什么会这样,如何访问嵌套在EstablishmentDetail
和Scores
中的子级?
编辑
由于下面的答案,我可以进入树的内部,但是如果我想检索诸如Scores
中的值,将失败:
for node in root.find('.//EstablishmentDetail/Scores'):
rating = node.attrib.get('Hygiene')
print rating
并产生
None
None
None
为什么?
您必须遍历您的根目录。
即root.iter()
将解决问题!
import xml.etree.ElementTree as ET
import urllib2
tree =ET.parse(urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml'))
root = tree.getroot()
for child in root.iter():
print child.tag, child.attrib
输出:
FHRSEstablishment {}
Header {}
ExtractDate {}
ItemCount {}
ReturnCode {}
EstablishmentCollection {}
EstablishmentDetail {}
FHRSID {}
LocalAuthorityBusinessID {}
...
EstablishmentDetail
中获取所有标签,您需要找到该标签,然后遍历其子标签!例如,]。>
for child in root.find('.//EstablishmentDetail'): print child.tag, child.attrib
输出:
FHRSID {}
LocalAuthorityBusinessID {}
BusinessName {}
BusinessType {}
BusinessTypeID {}
RatingValue {}
RatingKey {}
RatingDate {}
LocalAuthorityCode {}
LocalAuthorityName {}
LocalAuthorityWebSite {}
LocalAuthorityEmailAddress {}
Scores {}
SchemeType {}
NewRatingPending {}
Geocode {}
Hygiene
的分数,您要做的是,它将获得第一个Scores
标签,当您调用for each in root.find('.//Scores'):rating=child.get('Hygiene')
时,它将带有Hygiene,ConfidenceInManagement,Structured标签作为子标签。也就是说,显然所有三个孩子都将没有元素!
您需要先-查找所有Scores
标签。-在找到的每个标签中找到Hygiene
!
for each in root.findall('.//Scores'): rating = each.find('.//Hygiene') print '' if rating is None else rating.text
输出:
5
5
5
0
5
希望它可能有用: