在Python中读写特殊字符

Question

我有一个充满 .gz 文本档案的目录。为了扫描这些档案，我使用以下 python 代码：

    with gzip.open(logDir+"\\"+fileName, mode="rb") as archive:
        for filename in archive:
            print(filename.decode().strip())

以前都可以工作，但是，新系统添加了类似这样的行：

:§f Press [§bJ§f]

Python 给了我这个错误：

File "C:\Users\Me\Documents\Python\ConvertLog.py", line 16, in readZIP print(filename.decode().strip())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa7 in position 49: invalid start byte

有人知道如何处理突然出现的奇怪字符吗？我不能忽视这条线。这恰好是我需要删除并写入精简报告的几行之一。

除了“rb”之外，我尝试了其他模式。我真的不知道还能尝试什么。

Answer 1

您可以使用不同的选项来处理错误，并以不同的方式使用

decode()

，您可以在文档中阅读更多信息。

在

decode

中，您可以指定

errors='strict'

、

errors='ignore'

或

errors='replace'

。如果未指定，则

strict

是默认值，并且当它发现自己处于像您这样的情况时会抛出错误。

ignore

将简单地忽略无效字符。

replace

用“合适的替换字符”替换该字符。

因此，实现这一点的一种方法可能是：

import gzip

with gzip.open(logDir + "\\" + fileName, mode="rb") as archive:
    for line in archive:
        decoded_line = line.decode('utf-8', errors='ignore').strip()
        print(decoded_line)

在Python中读写特殊字符

问题描述投票：0回答：1

1个回答

最新问题

在Python中读写特殊字符

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1