我想获取有关使用python的统计信息的CSV文件中大约有10列数据。我当前正在使用import csv模块打开文件并读取内容。但是我还想查看2个特定的列以比较数据并根据数据获得一定百分比的准确性。尽管我可以打开文件并分析各行,但例如,我无法弄清楚如何比较:
行[i]列[8]和行[i]列[10]
我的伪代码将是这样的:
类别 =行[i]列[8]标签 =行[i]列[10]
if(category!=label):
difference+=1
totalChecked+=1
else:
correct+=1
totalChecked+=1
我唯一能做的就是读取整行。但是我想获得我的2个变量category和label的确切行和列,并进行比较。
我如何处理整个Excel工作表的特定行/列?
将它们都转换为熊猫数据帧,并与here进行类似比较。
我已经花了很多时间和精力来研究这个问题,因为这对我前进很有用。在他的示例中,列的长度完全不必相同,所以很好。我已经测试了下面的代码(Python 3.8),并且效果很好。
只需稍作修改就可以用于您的特定数据列,对象和目的。
import pandas as pd
A = pd.read_csv(r'C:\Users\User\Documents\query_sequences.csv') #dropped the S fom _sequences
B = pd.read_csv(r'C:\Users\User\Documents\Sequence_reference.csv')
print(A.columns)
print(B.columns)
my_unknown_id = A['Unknown_sample_no'].tolist() #Unknown_sample_no
my_unknown_seq = A['Unknown_sample_seq'].tolist() #Unknown_sample_seq
Reference_Species1 = B['Reference_sequences_ID'].tolist()
Reference_Sequences1 = B['Reference_Sequences'].tolist() #it was Reference_sequences
Ref_dict = dict(zip(Reference_Species1, Reference_Sequences1)) #it was Reference_sequences
Unknown_dict = dict(zip(my_unknown_id, my_unknown_seq))
print(Ref_dict)
print(Unknown_dict)
Ref_dict = dict(zip(Reference_Species1, Reference_Sequences1))
Unknown_dict = dict(zip(my_unknown_id, my_unknown_seq))
print(Ref_dict)
print(Unknown_dict)
import re
filename = 'seq_match_compare2.csv'
f = open(filename, 'a') #in his eg it was 'w'
headers = 'Query_ID, Query_Seq, Ref_species, Ref_seq, Match, Match start Position\n'
f.write(headers)
for ID, seq in Unknown_dict.items():
for species, seq1 in Ref_dict.items():
m = re.search(seq, seq1)
if m:
match = m.group()
pos = m.start() + 1
f.write(str(ID) + ',' + seq + ',' + species + ',' + seq1 + ',' + match + ',' + str(pos) + '\n')
f.close()
[我也是自己做的,假设您的列包含整数,并根据您的要求(目前为止,请尽我所能)。这是我的第一次尝试[我也是这个新手,所以轻松一点]。您可以在下面使用我的代码作为基准,以了解如何进一步解决您的问题。基本上,它可以完成您想要的事情(给您一个骨架),并执行以下操作:“使用pandas模块在python中导入csv,转换为数据框,仅在那些df的特定列上工作,创建新列(结果),将结果与原始列一起打印终端中的数据,并保存到新的csv”。它和我的python一样混乱,但是它可以工作!就我个人(和专业而言)而言,这是一个里程碑,我希望以后能提高它的可读性,范围,功能和能力(随着时间的推移(从下一个周末开始)。)
# This is work in progress, (although it does work and does a job), and its doing that for you. there are redundant lines of code in it, even the lines not hashed out (because im a self teaching newbie on my weekends). I was just finishing up on getting the results printed to a new csv file (done too). You can see how you could convert your columns & rows into lists with pandas dataframes, and start to do calculations with them in Python, and get your results back out to a new CSV. It a start on how you can answer your question going forward
import pandas as pd
from pandas import DataFrame
import csv
import itertools #redundant now'?
A = pd.read_csv(r'C:\Users\User\Documents\book6 category labels.csv')
A["Category"].fillna("empty data - missing value", inplace = True)
#A["Blank1"].fillna("empty data - missing value", inplace = True)
# ...etc
print(A.columns)
MyCat=A['Category'].tolist()
MyLab=A['Label'].tolist()
My_Cats = A['Category1'].tolist()
My_Labs = A['Label1'].tolist()
#Ref_dict0 = zip(My_Labs, My_Cats) #good to compare whole columns as block, Enumerate ZIP 19:06 01/06/2020 FORGET THIS FOR NOW, WAS PART OF A LATTER ATTEMPT TO COMPARE TEXT & MISSED TEXT WITH INTERGER FIELDS. DOESNT EFFECT PROGRAM
Ref_dict = dict(zip(My_Labs, My_Cats))
Compareprep = dict(zip(My_Cats, My_Labs))
Ref_dict = dict(zip(My_Cats, My_Labs))
print(Ref_dict)
import re
filename = 'CATS&LABS64.csv'
csvfile = open(filename, 'a')
print("Given Dataframe :\n", A)
A['Lab-Cat_diff'] = A['Category1'].sub(A['Label1'], axis=0)
print("\nDifference of score1 and score2 :\n", A)
#YOU CAN DO OTHER MATCHES, COMPARISONS AND CALCULTAIONS YOURSELF HERE AND ADD THEM TO THE OUTPUT
result = (print("\nDifference of score1 and score2 :\n", A))
result2 = print(A) and print(result)
def result22(result2):
for aSentence in result2:
df = pd.DataFrame(result2)
print(str())
return df
print(result2)
print(result22) # printing out the function itself 'produces nothing but its name of course
output_df = DataFrame((result2),A)
output_df.to_csv('some_name5523.csv')
是的,我知道,它绝不是完美的,并且(在终端中)复制了很多结果和输出。毕竟它是一个受时间挑战的初学者代码,但是它可以正常工作,并且可以回答您的问题(希望您能得到其他人的帮助)。现在请原谅,当我回到验证[插入专有的]美国州议会大厦推特手柄的情况下。