我有一个看起来像这样的XML文件:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Reviews>
<Review rid="1004293">
<sentences>
<sentence id="1004293:0">
<text>Judging from previous posts this used to be a good place, but not any longer.</text>
<Opinions>
<Opinion target="place" category="RESTAURANT#GENERAL" polarity="negative" from="51" to="56"/>
</Opinions>
</sentence>
<sentence id="1004293:1">
<text>The food here is rather good, but only if you like to wait for it.</text>
<Opinions>
<Opinion target="food" category="FOOD#QUALITY" polarity="positive" from="4" to="8"/>
<Opinion target="NULL" category="SERVICE#GENERAL" polarity="negative" from="0" to="0"/>
</Opinions>
</sentence>
...
我使用此脚本:
from xml.etree import ElementTree
file = 'sample.xml'
tree = ElementTree.parse(file)
root = tree.getroot()
with open('output.tsv', 'w') as tsvfile:
for sentence in root.iter('sentence'):
for opinion in sentence.iter('Opinion'):
writer.writerow([opinion.attrib['polarity'], sentence.find('text').text, opinion.attrib['category']])
获取如下所示的输出records.tsv文件:
['negative', 'Judging from previous posts this used to be a good place, but not any longer.', 'RESTAURANT#GENERAL']
['negative', 'We, there were four of us, arrived at noon - the place was empty - and the staff acted like we were imposing on them and they were very rude.', 'SERVICE#GENERAL']
['negative', 'They never brought us complimentary noodles, ignored repeated requests for sugar, and threw our dishes on the table.', 'SERVICE#GENERAL']
['negative', 'The food was lousy - too sweet or too salty and the portions tiny.', 'FOOD#QUALITY']
['negative', 'The food was lousy - too sweet or too salty and the portions tiny.', 'FOOD#STYLE_OPTIONS']
当我打开文件时,一切看起来不错,但是当我尝试将此文件加载到Pandas时,出现以下错误:
ParserError: Error tokenizing data. C error: Expected 5 fields in line 8, saw 6
我该如何解决?谢谢。
,
。 您可以像这样将csv
文件读取到pandas
DataFrame
:
>>> df = pd.read_csv(StringIO(data), header=None, quotechar="'", quoting=1, delimiter=',', skipinitialspace=True)
>>> df
0 1 2
0 negative Judging from previous posts this used to be a ... RESTAURANT#GENERAL
1 negative We, there were four of us, arrived at noon - t... SERVICE#GENERAL
2 negative They never brought us complimentary noodles, i... SERVICE#GENERAL
3 negative The food was lousy - too sweet or too salty an... FOOD#QUALITY
4 negative The food was lousy - too sweet or too salty an... FOOD#STYLE_OPTIONS