我有一个看起来像这样的数据集:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Reviews>
<Review rid="1004293">
<sentences>
<sentence id="1004293:0">
<text>Judging from previous posts this used to be a good place, but not any longer.</text>
<Opinions>
<Opinion target="place" category="RESTAURANT#GENERAL" polarity="negative" from="51" to="56"/>
</Opinions>
</sentence>
<sentence id="1004293:1">
<text>The food here is rather good, but only if you like to wait for it.</text>
<Opinions>
<Opinion target="food" category="FOOD#QUALITY" polarity="positive" from="4" to="8"/>
<Opinion target="NULL" category="SERVICE#GENERAL" polarity="negative" from="0" to="0"/>
</Opinions>
</sentence>
...
我如何以以下格式将.xml文件中的数据解析为.tsv文件:
[“ negative”,“从以前的帖子来看,这曾经是一个好地方,但是现在不再有用。”,“ RESTAURANT#GENERAL”]]
[“正”,“这里的食物相当好,但是只有当你想等它的时候。”,“食物质量”]
[“负面”,“这里的食物相当不错,但只有当您想等待时才可以。”,“ SERVICE#GENERAL”]
谢谢!
我有一个像这样的数据集:
sample.xml文件必须存在于存在此代码的相同目录中。
from xml.etree import ElementTree
file = 'sample.xml'
tree = ElementTree.parse(file)
root = tree.getroot()
for sentence in root.iter('sentence'):
# Loop all sentence in the xml
for opinion in sentence.iter('Opinion'):
# Loop all Opinion of a particular sentence.
print([opinion.attrib['polarity'], sentence.find('text').text, opinion.attrib['category']])
输出: