删除多csv python中多余的单词

问题描述 投票:0回答:1

我使用 python 对多个 csv 文件进行行计数。 下表是我的 csv 源文件。 有些文件是“/t/t”,有些文件是“/n/n” 但空白行脚本不起作用。 请帮我。谢谢。

客户编号 ISMB
细胞1 细胞2
细胞3 4 号电池
细胞3 4 号电池
细胞3 4 号电池

我的完整脚本是:

import glob
import json

files = glob.glob('C:/Users/Downloads/*.csv')
#Remove First Row 
for file in files:
      lines = open(file,encoding = "utf-16").readlines()
      open(file, 'w',encoding = "utf-16").writelines(lines[1:])

# Remove blank rows
for f in files:
 with open(f, 'r',encoding = "utf-16") as out:
  data = out.read()
  data=data.replace('\n\n','\n')
  data=data.replace('\t\t','\n')
  open(file, 'w',encoding = "utf-16").write()

#Count rows
a = {}
total = 0

for f in files:
    with open(f, 'r',encoding = "utf-16") as out:
        nb_row = len(out.readlines())
        a[f.replace(".csv", "")]= nb_row
        total += nb_row

#Convert to str
a=json.dumps(a)
a=a.replace('C:/Users/Downloads\\\\','')
a=a.replace(',', '\n')
a=a.replace('{', '\n')
a=a.replace('}', '\n')
a=a.replace('\n',"")


#Write a log
log=open('C:/Users/sandytong/Downloads'+'/'+'testinglog.txt',"w" )
from time import gmtime, strftime
log.write('Start Time   :' +strftime("%Y-%m-%d , gmtime())+"\n")
log.write("*"*46 + '\n')
log.write(a+ '\n')
log.write("*"*46 + '\n')

我尝试了下面的脚本但不起作用。 请帮我。谢谢。

# Remove blank rows
for f in files:
 with open(f, 'r',encoding = "utf-16") as out:
  data = out.read()
  data=data.replace('\n\n','\n')
  data=data.replace('\t\t','\n')
  open(file, 'w',encoding = "utf-16").write()

它回来了

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\encodings\utf_16.py", line 67, in _buffer_decode
    raise UnicodeError("UTF-16 stream does not start with BOM")
UnicodeError: UTF-16 stream does not start with BOM
python csv
1个回答
0
投票

乍一看,“UTF-16 流不以 BOM 开头”异常表明您正在使用错误的编码打开文件。由于我手头没有该文件,因此我无法建议您 csv 文件的编码格式。我建议从您的

encoding
调用中删除
open
参数,或确定文件的正确编码。

import glob

files = glob.glob('C:/Users/Downloads/*.csv')
for file in files:
      lines = open(file).readlines()
      print(lines)

您所看到的错误的有用参考: UnicodeError:UTF-16 流不以 BOM 开头

© www.soinside.com 2019 - 2024. All rights reserved.