xml.parsers.expat.ExpatError：格式不正确（无效令牌）

Question

当我使用 xmltodict 加载下面的 xml 文件时，出现错误： xml.parsers.expat.ExpatError：格式不正确（无效标记）：第 1 行，第 1 列

这是我的文件：

<?xml version="1.0" encoding="utf-8"?>
<mydocument has="an attribute">
  <and>
    <many>elements</many>
    <many>more elements</many>
  </and>
  <plus a="complex">
    element as well
  </plus>
</mydocument>

来源：

import xmltodict
with open('fileTEST.xml') as fd:
   xmltodict.parse(fd.read())

我使用的是 Windows 10，使用 Python 3.6 和 xmltodict 0.11.0

如果我使用 ElementTree 它就可以工作

tree = ET.ElementTree(file='fileTEST.xml')
    for elem in tree.iter():
            print(elem.tag, elem.attrib)

mydocument {'has': 'an attribute'}
and {}
many {}
many {}
plus {'a': 'complex'}

注意：我可能遇到了换行问题。
注2：我在两个不同的文件上使用了Beyond Compare。
它在 UTF-8 BOM 编码的文件上崩溃，并在 UTF-8 文件上运行。
UTF-8 BOM 是一个字节序列 (EF BB BF)，允许读者将文件识别为采用 UTF-8 编码的文件。

Answer 1

我认为您忘记定义编码类型。我建议您尝试将该 xml 文件初始化为字符串变量：

import xml.etree.ElementTree as ET
import xmltodict
import json


tree = ET.parse('your_data.xml')
xml_data = tree.getroot()
#here you can change the encoding type to be able to set it to the one you need
xmlstr = ET.tostring(xml_data, encoding='utf-8', method='xml')

data_dict = dict(xmltodict.parse(xmlstr))

Answer 2

我遇到了同样的问题，只需指定 open 函数的编码即可解决。

在这种情况下，它会是这样的：

import xmltodict
with open('fileTEST.xml', encoding='utf8') as fd:
   xmltodict.parse(fd.read())

Answer 3

在我的例子中，文件是用字节顺序标记保存的，这是记事本++的默认设置

我重新保存了文件没有

BOM

到普通

utf8

。

Answer 4

Python 3

一个衬垫

data: dict = xmltodict.parse(ElementTree.tostring(ElementTree.parse(path).getroot()))

.json

和

.xml

的助手

我编写了一个小辅助函数来从给定的

.json

加载

.xml

和

path

文件。我想这对这里的一些人来说可能会派上用场：

import json
import xml.etree.ElementTree

def load_json(path: str) -> dict:  
    if path.endswith(".json"):
        print(f"> Loading JSON from '{path}'")
        with open(path, mode="r") as open_file:
            content = open_file.read()

        return json.loads(content)
    elif path.endswith(".xml"):
        print(f"> Loading XML as JSON from '{path}'")
        xml = ElementTree.tostring(ElementTree.parse(path).getroot())
        return xmltodict.parse(xml, attr_prefix="@", cdata_key="#text", dict_constructor=dict)

    print(f"> Loading failed for '{path}'")
    return {}

注释

如果您想删除 json 输出中的
```
@
```
和
```
#text
```
标记，请使用参数
```
attr_prefix=""
```
和
```
cdata_key=""
```

通常

xmltodict.parse()

返回

OrderedDict

但您可以使用参数

dict_constructor=dict

用法

path = "my_data.xml"
data = load_json(path)
print(json.dumps(data, indent=2))

# OUTPUT
#
# > Loading XML as JSON from 'my_data.xml' 
# {
#   "mydocument": {
#     "@has": "an attribute",
#     "and": {
#       "many": [
#         "elements",
#         "more elements"
#       ]
#     },
#     "plus": {
#       "@a": "complex",
#       "#text": "element as well"
#     }
#   }
# }

来源

Answer 5

就我而言，问题出在前 3 个字符上。所以删除它们是有效的：

import xmltodict
from xml.parsers.expat import ExpatError

with open('your_data.xml') as f:
    data = f.read()
    try:
        doc = xmltodict.parse(data)
    except ExpatError:
        doc = xmltodict.parse(data[3:])

Answer 6

xmltodict

好像无法解析

<?xml version="1.0" encoding="utf-8"?>

如果删除此行，它就可以工作。

Answer 7

并非特定于原始帖子，但对于那些在不同行也遇到相同错误的人，我可以通过更正 XML/XHTML 错误来修复它。

就我而言，我正在使用的文档有一个带有百分比符号“&”而不是“&”的文本描述，因此为了解决我的问题，我必须在运行解析器之前先编辑文件。

Answer 8

我的错误不是在我的代码中而是在服务器端。具体来说，在 Odoo 端发送 XML-RPC 请求时。

修复方法是将有效负载以 utf-8 进行字节编码，如下所示：

payload = xmlrpc.client.dumps(args, 'execute_kw') # type is str
payload = payload.encode('utf-8')                 # type is bytes

request('POST', endpoint, body=payload, headers={'Content-Type': 'application/xml'})

xml.parsers.expat.ExpatError：格式不正确（无效令牌）

问题描述投票：0回答：8

8个回答

Python 3

一个衬垫

`.json`
和
`.xml`

来源

最新问题

xml.parsers.expat.ExpatError：格式不正确（无效令牌）

问题描述 投票：0回答：8

8个回答

Python 3

一个衬垫

.json和.xml

来源

最新问题

问题描述投票：0回答：8

`.json`
和
`.xml`