如何使用CSV文件中的行/列?

问题描述 投票:1回答:1

我想获取有关使用python的统计信息的CSV文件中大约有10列数据。我当前正在使用import csv模块打开文件并读取内容。但是我还想查看2个特定的列以比较数据并根据数据获得一定百分比的准确性。尽管我可以打开文件并分析各行,但例如,我无法弄清楚如何比较:

行[i]列[8]和行[i]列[10]

我的伪代码将是这样的:

类别 =行[i]列[8]标签 =行[i]列[10]

if(category!=label):
   difference+=1
   totalChecked+=1
else:
  correct+=1
  totalChecked+=1

我唯一能做的就是读取整行。但是我想获得我的2个变量categorylabel的确切行和列,并进行比较。

我如何处理整个Excel工作表的特定行/列?

python csv rows
1个回答
0
投票

将它们都转换为熊猫数据帧,并与here进行类似比较。

我已经花了很多时间和精力来研究这个问题,因为这对我前进很有用。在他的示例中,列的长度完全不必相同,所以很好。我已经测试了下面的代码(Python 3.8),并且效果很好。

只需稍作修改就可以用于您的特定数据列,对象和目的。

import pandas as pd 
A = pd.read_csv(r'C:\Users\User\Documents\query_sequences.csv') #dropped the S fom _sequences
B = pd.read_csv(r'C:\Users\User\Documents\Sequence_reference.csv')

print(A.columns)
print(B.columns)

my_unknown_id = A['Unknown_sample_no'].tolist() #Unknown_sample_no
my_unknown_seq = A['Unknown_sample_seq'].tolist() #Unknown_sample_seq

Reference_Species1 = B['Reference_sequences_ID'].tolist()
Reference_Sequences1 = B['Reference_Sequences'].tolist() #it was Reference_sequences

Ref_dict = dict(zip(Reference_Species1, Reference_Sequences1)) #it was Reference_sequences
Unknown_dict = dict(zip(my_unknown_id, my_unknown_seq))

print(Ref_dict)
print(Unknown_dict)

Ref_dict = dict(zip(Reference_Species1, Reference_Sequences1))
Unknown_dict = dict(zip(my_unknown_id, my_unknown_seq))

print(Ref_dict)
print(Unknown_dict)

import re
filename = 'seq_match_compare2.csv'
f = open(filename, 'a') #in his eg it was 'w'


headers = 'Query_ID, Query_Seq, Ref_species, Ref_seq, Match, Match start Position\n'
f.write(headers)

for ID, seq in Unknown_dict.items():
    for species, seq1 in Ref_dict.items():
         m = re.search(seq, seq1)
         if m:
              match = m.group()
              pos = m.start() + 1
              f.write(str(ID) + ',' + seq + ',' + species + ',' + seq1 + ',' + match + ',' + str(pos) + '\n')

f.close()

[我也是自己做的,假设您的列包含整数,并根据您的要求(目前为止,请尽我所能)。这是我的第一次尝试[我也是这个新手,所以轻松一点]。您可以在下面使用我的代码作为基准,以了解如何进一步解决您的问题。基本上,它可以完成您想要的事情(给您一个骨架),并执行以下操作:“使用pandas模块在python中导入csv,转换为数据框,仅在那些df的特定列上工作,创建新列(结果),将结果与原始列一起打印终端中的数据,并保存到新的csv”。它和我的python一样混乱,但是它可以工作!就我个人(和专业而言)而言,这是一个里程碑,我希望以后能提高它的可读性,范围,功能和能力(随着时间的推移(从下一个周末开始)。)

# This is work in progress, (although it does work and does a job), and its doing that for you. there are redundant lines of code in it, even the lines not hashed out (because im a self teaching  newbie on my weekends). I was just  finishing up on getting the results printed to a new csv file (done too). You can see how you could convert your columns & rows into lists with pandas dataframes, and start to do calculations with them in Python, and get your results back out to a new CSV. It a start on how you can answer your question going forward

import pandas as pd
from pandas import DataFrame
import csv
import itertools #redundant now'?

A = pd.read_csv(r'C:\Users\User\Documents\book6 category labels.csv')

A["Category"].fillna("empty data - missing value", inplace = True)
#A["Blank1"].fillna("empty data - missing value", inplace = True)
# ...etc 

print(A.columns)

MyCat=A['Category'].tolist()
MyLab=A['Label'].tolist()

My_Cats = A['Category1'].tolist()

My_Labs = A['Label1'].tolist()
#Ref_dict0 = zip(My_Labs, My_Cats) #good to compare whole columns as block, Enumerate ZIP 19:06 01/06/2020 FORGET THIS FOR NOW, WAS PART OF A LATTER ATTEMPT TO COMPARE TEXT & MISSED TEXT WITH INTERGER FIELDS. DOESNT EFFECT PROGRAM

Ref_dict = dict(zip(My_Labs, My_Cats))

Compareprep = dict(zip(My_Cats, My_Labs))

Ref_dict = dict(zip(My_Cats, My_Labs))

print(Ref_dict)

import re
filename = 'CATS&LABS64.csv'
csvfile = open(filename, 'a')

print("Given Dataframe :\n", A)

A['Lab-Cat_diff'] = A['Category1'].sub(A['Label1'], axis=0)
print("\nDifference of score1 and score2 :\n", A)

#YOU CAN DO OTHER MATCHES, COMPARISONS AND CALCULTAIONS YOURSELF HERE AND ADD THEM TO THE OUTPUT

result = (print("\nDifference of score1 and score2 :\n", A))
result2 = print(A) and print(result)
def result22(result2):
    for aSentence in result2:
        df = pd.DataFrame(result2)
    print(str())
    return df


print(result2)
print(result22) # printing out the function itself 'produces nothing but its name of course

output_df = DataFrame((result2),A)
output_df.to_csv('some_name5523.csv')

是的,我知道,它绝不是完美的,并且(在终端中)复制了很多结果和输出。毕竟它是一个受时间挑战的初学者代码,但是它可以正常工作,并且可以回答您的问题(希望您能得到其他人的帮助)。现在请原谅,当我回到验证[插入专有的]美国州议会大厦推特手柄的情况下。

最新问题
© www.soinside.com 2019 - 2025. All rights reserved.