我正在尝试使用 Python 解析以下 XML。我正在使用:
thumbnail_tag = dom.getElementsByTagName('media:thumbnail')[0].toxml()
这将选择第一个。我知道我可以将
[0]
更改为 [1]
来获取带有 yt:name="mqdefault"
的标签,但是还有其他方法可以更改上面语句中的参数(向 media:thumbnail
添加一些内容)吗?
<entry>
<media:thumbnail url="http://i.ytimg.com/vi/k8J-72MmTGg/default.jpg" height="90" width="120" time="00:01:48.500" yt:name="default" />
<media:thumbnail url="http://i.ytimg.com/vi/k8J-72MmTGg/mqdefault.jpg" height="180" width="320" yt:name="mqdefault" />
<media:thumbnail url="http://i.ytimg.com/vi/k8J-72MmTGg/hqdefault.jpg" height="360" width="480" yt:name="hqdefault" />
</entry>
<entry>
<media:thumbnail url="http://i.ytimg.com/vi/k8J-72MmTGg/default.jpg" height="90" width="120" time="00:01:48.500" yt:name="default" />
<media:thumbnail url="http://i.ytimg.com/vi/k8J-72MmTGg/mqdefault.jpg" height="180" width="320" yt:name="mqdefault" />
<media:thumbnail url="http://i.ytimg.com/vi/k8J-72MmTGg/hqdefault.jpg" height="360" width="480" yt:name="hqdefault" />
</entry>
要创建此 xml 字符串的 dom 对象,您必须在根标签或同一标签中定义 XML 命名空间。
命名空间由元素开头的 xmlns 属性定义。
命名空间声明具有以下语法:
xmlns:prefix="URI"
例如:
<root>
<h:table xmlns:h="http://bluejson.com/W3C/">
<h:tr>
<h:td>JSON</h:td>
<h:td>JavaScript</h:td>
<h:td>Python</h:td>
</h:tr>
</h:table>
<f:table xmlns:f="http://bluejson.com/W3C/">
<f:name>My Study Room</f:name>
<f:width>800</f:width>
<f:height>420</f:height>
<f:length>1120</f:length>
</f:table>
</root>
在上面的示例中,标记中的 xmlns 属性给出了限定名称空间的 h: 和 f: 前缀。
为元素定义命名空间后,所有具有相同前缀的子元素都与相同的命名空间关联。
命名空间可以在使用它们的元素或 XML 根元素中声明:
<root xmlns:h="http://bluejson.com/W3C/" xmlns:f="http://bluejson.com/W3C/">
<h:table>
<h:tr>
<h:td>JSON</h:td>
<h:td>JavaScript</h:td>
<h:td>Python</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>My Study Room</f:name>
<f:width>800</f:width>
<f:height>420</f:height>
<f:length>1120</f:length>
</f:table>
</root>
现在,创建 xml dom 对象并获取属性的 Python 代码
import xml.dom.minidom
dom = xml.dom.minidom.parseString("""
<root xmlns:media="http://media/" xmlns:yt="http://media/yt/">
<media:thumbnail url="http://i.ytimg.com/vi/k8J-72MmTGg/default.jpg" height="90" width="120" time="00:01:48.500" yt:name="default" />
<media:thumbnail url="http://i.ytimg.com/vi/k8J-72MmTGg/mqdefault.jpg" height="180" width="320" yt:name="mqdefault" />
<media:thumbnail url="http://i.ytimg.com/vi/k8J-72MmTGg/hqdefault.jpg" height="360" width="480" yt:name="hqdefault" />
</root>""")
media_thumbnail = dom.getElementsByTagNameNS("http://media/","thumbnail")
print media_thumbnail[0].getAttribute("height")
print media_thumbnail[0].getAttribute("width")
print media_thumbnail[0].getAttribute("time")
print media_thumbnail[0].getAttributeNS("http://media/yt/","name")
media_thumbnail[0].setAttribute("unit","px")
media_thumbnail[0].setAttributeNS("http://media/yt/","value","1")
print dom.toxml()
输出:
90
120
00:01:48.500
default
<?xml version="1.0" ?><root xmlns:media="http://media/" xmlns:yt="http://media/yt/">
<media:thumbnail height="90" time="00:01:48.500" unit="px" url="http://i.ytimg.com/vi/k8J-72MmTGg/default.jpg" value="1" width="120" yt:name="default"/>
<media:thumbnail height="180" url="http://i.ytimg.com/vi/k8J-72MmTGg/mqdefault.jpg" width="320" yt:name="mqdefault"/>
<media:thumbnail height="360" url="http://i.ytimg.com/vi/k8J-72MmTGg/hqdefault.jpg" width="480" yt:name="hqdefault"/>
</root>
对于您的实施,您可以使用:
for element in thumbnail_tag:
attr = element.getAttribute('yt:name')
要更改属性的值:
for element in thumbnail_tag:
attr = element.getAttribute('yt:name')
if attr == 'mqdefault':
element.setAttribute('yt:name', 'new_value')
break
我建议使用标准的
xml.etree.ElementTree
而不是 DOM。虽然 DOM 更传统,但它也更丑陋且更难使用。请参阅深入了解 Python 3,第 12 章。XML。
标准模块支持 XPath 语言的子集,这可能对您的情况有用。
这是从
sample.xml
中提取所需元素的示例代码:
import xml.etree.ElementTree as et
tree = et.parse('sample.xml')
root = tree.getroot() # the root element of the tree
##et.dump(root) # here is how the input file looks inside
print '==========================================='
print 'Iterate through all media:thumbnail:'
# XPath expressions that describe the wanted elements. Here we have 3 ones;
# however, they are just strings and can be constructed on the fly.
xp_default = ".//{http://search.yahoo.com/mrss/}thumbnail[" \
"@{http://gdata.youtube.com/schemas/2007}name='default']"
xp_mqdefault = ".//{http://search.yahoo.com/mrss/}thumbnail[" \
"@{http://gdata.youtube.com/schemas/2007}name='mqdefault']"
xp_hqdefault = ".//{http://search.yahoo.com/mrss/}thumbnail[" \
"@{http://gdata.youtube.com/schemas/2007}name='hqdefault']"
for e in root.iterfind(xp_default):
et.dump(e)
print '-------------------------------------------'
for e in root.iterfind(xp_mqdefault):
et.dump(e)
print '-------------------------------------------'
for e in root.iterfind(xp_hqdefault):
et.dump(e)
print '-------------------------------------------'
print 'The e.attrib is a dictionary of attributes:'
print e.attrib
它打印以下内容...:
c:\tmp\___python\sharataka\so12776774>py a.py
===========================================
Iterate through all media:thumbnail:
<ns0:thumbnail xmlns:ns0="http://search.yahoo.com/mrss/" xmlns:ns1="http://gdata
.youtube.com/schemas/2007" height="90" time="00:01:41" url="http://img.youtube.c
om/vi/jXE6G9CYcJs/default.jpg" width="120" ns1:name="default" />
-------------------------------------------
<ns0:thumbnail xmlns:ns0="http://search.yahoo.com/mrss/" xmlns:ns1="http://gdata
.youtube.com/schemas/2007" height="180" url="http://img.youtube.com/vi/jXE6G9CYc
Js/mqdefault.jpg" width="320" ns1:name="mqdefault" />
-------------------------------------------
<ns0:thumbnail xmlns:ns0="http://search.yahoo.com/mrss/" xmlns:ns1="http://gdata
.youtube.com/schemas/2007" height="360" url="http://img.youtube.com/vi/jXE6G9CYc
Js/hqdefault.jpg" width="480" ns1:name="hqdefault" />
-------------------------------------------
The e.attrib is a dictionary of attributes:
{'url': 'http://img.youtube.com/vi/jXE6G9CYcJs/hqdefault.jpg', 'width': '480', '
height': '360', '{http://gdata.youtube.com/schemas/2007}name': 'hqdefault'}
...对于
sample.xml
(在某处找到,缩短)及其内容:
<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'
xmlns:media='http://search.yahoo.com/mrss/'
xmlns:yt='http://gdata.youtube.com/schemas/2007'>
<entry>
<media:group>
<media:title type='plain'>Learning the ABCs</media:title>
<media:description type='plain'>
A great method for teaching kids the alphabet.
</media:description>
<media:keywords>alphabet, teaching, children</media:keywords>
<yt:duration seconds='202'/>
<yt:videoid>jXE6G9CYcJs</yt:videoid>
<media:credit role='uploader' scheme='urn:youtube'
yt:display='GoogleDeveloperssFriend'>GoogleDeveloperssFriend</media:credit>
<media:category label='Education'
scheme='http://gdata.youtube.com/schemas/2007/categories.cat'>
Education</media:category>
<media:content url='http://www.youtube.com/v/jXE6G9CYcJs'
type='application/x-shockwave-flash' medium='video' isDefault='true'
expression='full' duration='202' yt:format='5'/>
<media:content
url='rtsp://rtsp2.youtube.com/ChoLENySANFEgGDA==/0/0/0/video.3gp'
type='video/3gpp' medium='video' expression='full'
duration='202' yt:format='1'/>
<media:content
url='rtsp://rtsp2.youtube.com/ChoLENySARFEgGDA==/0/0/0/video.3gp'
type='video/3gpp' medium='video' expression='full'
duration='202' yt:format='6'/>
<media:player url='https://www.youtube.com/watch?v=jXE6G9CYcJs'/>
<media:thumbnail url='http://img.youtube.com/vi/jXE6G9CYcJs/default.jpg'
height='90' width='120' time='00:01:41' yt:name='default'/>
<media:thumbnail url='http://img.youtube.com/vi/jXE6G9CYcJs/hqdefault.jpg'
height='360' width='480' yt:name='hqdefault'/>
<media:thumbnail url='http://img.youtube.com/vi/jXE6G9CYcJs/mqdefault.jpg'
height='180' width='320' yt:name='mqdefault'/>
<media:thumbnail url='http://img.youtube.com/vi/jXE6G9CYcJs/1.jpg'
height='90' width='120' time='00:00:50.500' yt:name='start'/>
<media:thumbnail url='http://img.youtube.com/vi/jXE6G9CYcJs/2.jpg'
height='90' width='120' time='00:01:41' yt:name='end'/>
<media:thumbnail url='http://img.youtube.com/vi/jXE6G9CYcJs/3.jpg'
height='90' width='120' time='00:02:31.500' yt:name='middle'/>
</media:group>
<yt:statistics viewCount='286355' favoriteCount='201'/>
</entry>
</feed>