UTF-8 解码将 CSV 读取到 Pandas 时出错，尽管采用 UTF-8 编码

Question

我正在尝试通过在 pandas 数据框中进行转换来将 csv 文件转换为 tsv 文件。

for csv_file in os.listdir(input_dir):
    if csv_file.endswith('.csv'):
        print("working on " + csv_file)
        # Full path to the current CSV file
        csv_path = os.path.join(input_dir, csv_file)

        # Read the CSV file
        df = pd.read_csv(csv_path, encoding='utf-8')

        # Create corresponding TSV file name
        base_name = os.path.splitext(csv_file)[0]
        tsv_file = os.path.join(output_dir, f'{base_name}.tsv')

        # Convert and save as TSV file
        df.to_csv(tsv_file, sep='\t', index=False)

        print(f"File {csv_file} successfully converted to {tsv_file}")

我非常有信心所有 csv 文件都是使用“UTF-8 with BOM”编码的。然而，其中一些文件失败并出现错误：

UnicodeDecodeError：“utf-8”编解码器无法解码位置中的字节 0xe3 119：数据意外结束

然后我尝试使用“Latin-1”解码，这会导致“ï»¿”成为 tsv 文件中的前 3 个字符，这表明根据我的理解，该文件是使用 UTF-8-BOM 编码的。但是为什么不能正确读取UTF-8编码的文件呢？

注意 utf-8-sig 会导致与 utf-8 相同的错误

Answer 1

在Data.olllo中打开这种csv，然后通过Advanced Save转换编码格式。

在这个软件里处理这样的事情很简单，不用写代码。

UTF-8 解码将 CSV 读取到 Pandas 时出错，尽管采用 UTF-8 编码

问题描述投票：0回答：1

1个回答

最新问题

UTF-8 解码将 CSV 读取到 Pandas 时出错，尽管采用 UTF-8 编码

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1