我使用 python 对多个 csv 文件进行行计数。 下表是我的 csv 源文件。 有些文件是“/t/t”,有些文件是“/n/n” 但空白行脚本不起作用。 请帮我。谢谢。
客户编号 | ISMB |
---|---|
细胞1 | 细胞2 |
细胞3 | 4 号电池 |
细胞3 | 4 号电池 |
细胞3 | 4 号电池 |
我的完整脚本是:
import glob
import json
files = glob.glob('C:/Users/Downloads/*.csv')
#Remove First Row
for file in files:
lines = open(file,encoding = "utf-16").readlines()
open(file, 'w',encoding = "utf-16").writelines(lines[1:])
# Remove blank rows
for f in files:
with open(f, 'r',encoding = "utf-16") as out:
data = out.read()
data=data.replace('\n\n','\n')
data=data.replace('\t\t','\n')
open(file, 'w',encoding = "utf-16").write()
#Count rows
a = {}
total = 0
for f in files:
with open(f, 'r',encoding = "utf-16") as out:
nb_row = len(out.readlines())
a[f.replace(".csv", "")]= nb_row
total += nb_row
#Convert to str
a=json.dumps(a)
a=a.replace('C:/Users/Downloads\\\\','')
a=a.replace(',', '\n')
a=a.replace('{', '\n')
a=a.replace('}', '\n')
a=a.replace('\n',"")
#Write a log
log=open('C:/Users/sandytong/Downloads'+'/'+'testinglog.txt',"w" )
from time import gmtime, strftime
log.write('Start Time :' +strftime("%Y-%m-%d , gmtime())+"\n")
log.write("*"*46 + '\n')
log.write(a+ '\n')
log.write("*"*46 + '\n')
我尝试了下面的脚本但不起作用。 请帮我。谢谢。
# Remove blank rows
for f in files:
with open(f, 'r',encoding = "utf-16") as out:
data = out.read()
data=data.replace('\n\n','\n')
data=data.replace('\t\t','\n')
open(file, 'w',encoding = "utf-16").write()
它回来了
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\encodings\utf_16.py", line 67, in _buffer_decode
raise UnicodeError("UTF-16 stream does not start with BOM")
UnicodeError: UTF-16 stream does not start with BOM
乍一看,“UTF-16 流不以 BOM 开头”异常表明您正在使用错误的编码打开文件。由于我手头没有该文件,因此我无法建议您 csv 文件的编码格式。我建议从您的
encoding
调用中删除 open
参数,或确定文件的正确编码。
import glob
files = glob.glob('C:/Users/Downloads/*.csv')
for file in files:
lines = open(file).readlines()
print(lines)
您所看到的错误的有用参考: UnicodeError:UTF-16 流不以 BOM 开头