如何使用CSV文件中的行/列？

Question

我想获取有关使用python的统计信息的CSV文件中大约有10列数据。我当前正在使用import csv模块打开文件并读取内容。但是我还想查看2个特定的列以比较数据并根据数据获得一定百分比的准确性。尽管我可以打开文件并分析各行，但例如，我无法弄清楚如何比较：

行[i]列[8]和行[i]列[10]

我的伪代码将是这样的：

类别 =行[i]列[8]标签 =行[i]列[10]

if(category!=label):
   difference+=1
   totalChecked+=1
else:
  correct+=1
  totalChecked+=1

我唯一能做的就是读取整行。但是我想获得我的2个变量category和label的确切行和列，并进行比较。

我如何处理整个Excel工作表的特定行/列？

Answer 1

将它们都转换为熊猫数据帧，并与here进行类似比较。

我已经花了很多时间和精力来研究这个问题，因为这对我前进很有用。在他的示例中，列的长度完全不必相同，所以很好。我已经测试了下面的代码（Python 3.8），并且效果很好。

只需稍作修改就可以用于您的特定数据列，对象和目的。

import pandas as pd 
A = pd.read_csv(r'C:\Users\User\Documents\query_sequences.csv') #dropped the S fom _sequences
B = pd.read_csv(r'C:\Users\User\Documents\Sequence_reference.csv')

print(A.columns)
print(B.columns)

my_unknown_id = A['Unknown_sample_no'].tolist() #Unknown_sample_no
my_unknown_seq = A['Unknown_sample_seq'].tolist() #Unknown_sample_seq

Reference_Species1 = B['Reference_sequences_ID'].tolist()
Reference_Sequences1 = B['Reference_Sequences'].tolist() #it was Reference_sequences

Ref_dict = dict(zip(Reference_Species1, Reference_Sequences1)) #it was Reference_sequences
Unknown_dict = dict(zip(my_unknown_id, my_unknown_seq))

print(Ref_dict)
print(Unknown_dict)

Ref_dict = dict(zip(Reference_Species1, Reference_Sequences1))
Unknown_dict = dict(zip(my_unknown_id, my_unknown_seq))

print(Ref_dict)
print(Unknown_dict)

import re
filename = 'seq_match_compare2.csv'
f = open(filename, 'a') #in his eg it was 'w'


headers = 'Query_ID, Query_Seq, Ref_species, Ref_seq, Match, Match start Position\n'
f.write(headers)

for ID, seq in Unknown_dict.items():
    for species, seq1 in Ref_dict.items():
         m = re.search(seq, seq1)
         if m:
              match = m.group()
              pos = m.start() + 1
              f.write(str(ID) + ',' + seq + ',' + species + ',' + seq1 + ',' + match + ',' + str(pos) + '\n')

f.close()

[我也是自己做的，假设您的列包含整数，并根据您的要求（目前为止，请尽我所能）。这是我的第一次尝试[我也是这个新手，所以轻松一点]。您可以在下面使用我的代码作为基准，以了解如何进一步解决您的问题。基本上，它可以完成您想要的事情（给您一个骨架），并执行以下操作：“使用pandas模块在python中导入csv，转换为数据框，仅在那些df的特定列上工作，创建新列（结果），将结果与原始列一起打印终端中的数据，并保存到新的csv”。它和我的python一样混乱，但是它可以工作！就我个人（和专业而言）而言，这是一个里程碑，我希望以后能提高它的可读性，范围，功能和能力（随着时间的推移（从下一个周末开始）。）

# This is work in progress, (although it does work and does a job), and its doing that for you. there are redundant lines of code in it, even the lines not hashed out (because im a self teaching  newbie on my weekends). I was just  finishing up on getting the results printed to a new csv file (done too). You can see how you could convert your columns & rows into lists with pandas dataframes, and start to do calculations with them in Python, and get your results back out to a new CSV. It a start on how you can answer your question going forward

import pandas as pd
from pandas import DataFrame
import csv
import itertools #redundant now'?

A = pd.read_csv(r'C:\Users\User\Documents\book6 category labels.csv')

A["Category"].fillna("empty data - missing value", inplace = True)
#A["Blank1"].fillna("empty data - missing value", inplace = True)
# ...etc 

print(A.columns)

MyCat=A['Category'].tolist()
MyLab=A['Label'].tolist()

My_Cats = A['Category1'].tolist()

My_Labs = A['Label1'].tolist()
#Ref_dict0 = zip(My_Labs, My_Cats) #good to compare whole columns as block, Enumerate ZIP 19:06 01/06/2020 FORGET THIS FOR NOW, WAS PART OF A LATTER ATTEMPT TO COMPARE TEXT & MISSED TEXT WITH INTERGER FIELDS. DOESNT EFFECT PROGRAM

Ref_dict = dict(zip(My_Labs, My_Cats))

Compareprep = dict(zip(My_Cats, My_Labs))

Ref_dict = dict(zip(My_Cats, My_Labs))

print(Ref_dict)

import re
filename = 'CATS&LABS64.csv'
csvfile = open(filename, 'a')

print("Given Dataframe :\n", A)

A['Lab-Cat_diff'] = A['Category1'].sub(A['Label1'], axis=0)
print("\nDifference of score1 and score2 :\n", A)

#YOU CAN DO OTHER MATCHES, COMPARISONS AND CALCULTAIONS YOURSELF HERE AND ADD THEM TO THE OUTPUT

result = (print("\nDifference of score1 and score2 :\n", A))
result2 = print(A) and print(result)
def result22(result2):
    for aSentence in result2:
        df = pd.DataFrame(result2)
    print(str())
    return df


print(result2)
print(result22) # printing out the function itself 'produces nothing but its name of course

output_df = DataFrame((result2),A)
output_df.to_csv('some_name5523.csv')

是的，我知道，它绝不是完美的，并且（在终端中）复制了很多结果和输出。毕竟它是一个受时间挑战的初学者代码，但是它可以正常工作，并且可以回答您的问题（希望您能得到其他人的帮助）。现在请原谅，当我回到验证[插入专有的]美国州议会大厦推特手柄的情况下。

如何使用CSV文件中的行/列？

问题描述投票：1回答：1

1个回答

最新问题

如何使用CSV文件中的行/列？

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1