如果我可以反向解析csv,无论错误如何都能得到正确的值。
df1 = pd.read_csv('MyData.csv', error_bad_lines=False)
我能够看到列前面的所有列都有额外的逗号显示正常。
import pandas as pd
import csv
with open('Myfile', 'rb') as f,
open('Newfile', 'wb') as g:
writer = csv.writer(g, delimiter=',')
for line in f:
row = line.split(',', 2)
writer.writerow(row)
我想在python pandas中这样做
示例csv:
id,name,place,address,age,type,dob,date
1,Murtaza,someplace,Street,MA,22,B,somedate,somedate,
2,Murtaza,someplace,somestreet,45,C,somedate,somedate,
3,Murtaza,someplace,somestreet,MA,44,V,somedate,somedate
Excel输出:
id name place address age type dob date newcolumn9
1 Murtaza someplace somestreet MA 22 B somedate somedate
2 Murtaza someplace somestreet 45 C somedate somedate
3 Murtaza someplace somestreet MA 44 V somedate somedate
我想要年龄栏。我无法发布原始csv或其输出plzz了解
熊猫,或只是re.split()
:
import re
your_csv_file=open('your_csv_file.csv','r').read()
i_column=2 #index of desired column, counted from back
lines=re.split('\n',your_csv_file)[:-1] #eventually remove last (empty) line
your_column=[]
for line in lines:
your_column.append(re.split(',',line)[-i_column]) #the minus affects indexing beginning at the end
print(your_column)
在.csv文件上执行,如下所示
4rth,askj,fpou,ABC,aekert
kjgf,poiuf,pejhh,,oeiu,DEF,akdhg
iuzrit,fslgk,gth,,rhf,,rhe,GHI,ozug
pwiuto,,,,eflgjkhrlguiazg,JKL,rgj
这回来了
['ABC', 'DEF', 'GHI', 'JKL']
我认为最好的方法可能是编写一个单独的脚本来删除错误的逗号。但是如果你想忽略错误的行,那么可以通过将每行读入StringIO并忽略逗号数量不正确的行来完成。所以,如果你期待4列:
from cStringIO import StringIO
import pandas
s = StringIO()
correct_columns = 4
with open('MyData.csv') as file:
for line in file:
if len(','.split(line)) == correct_columns:
s.write(line)
s.seek(0)
pandas.read_csv(s)