循环 Pandas 列，同时计算包含两个指定值的行

Question

我试图计算 Pandas 数据框中包含两个指定值的每一行，但问题是这些值可以位于任何列中。我怎样才能循环该部分;

for col in con.columns:
    counts += len(con[(con[col]==name1)])

...这样我就有两个条件，要求 name1 和 name2 存在于同一行？我会使用交叉表，但数据集太大了。

con = pd.read_csv("keyword connections.txt", sep="\t")
key = pd.read_csv("keyword short list.txt", sep="\t")    
text_file = open("Output.txt", "w")
    
i = 0
j = 0
k = 0
for i in range(0,key.size):
    name1 = key.iloc[i]["Keyword"]
    for j in range(0,key.size):
        name2 = key.iloc[j]["Keyword"]
        counts = 0
        for col in con.columns:
            counts += len(con[(con[col]==name1)])
        text_file.write(name1+"\t"+name2+"\t"+str(counts)+"\n")
        j += 1
        k += 1
    i += 1
text_file.close()
print(k)

Answer 1

假设这个例子：

    col1   col2   col3
0  name1  other  name2
1  other  name2  other
2  name2  name1  other
3  other  other  other

您可以使用使用

eq

和

any

构建的多个掩码，并将它们与

组合以执行布尔索引:

m1 = df.eq('name1').any(axis=1)
m2 = df.eq('name2').any(axis=1)

out = df[m1 & m2]

或者，您可以使用所需值的

set

和

agg

:

S = {'name1', 'name2'}

out = df[df.agg(S.issubset, axis=1)]

输出：

    col1   col2   col3
0  name1  other  name2
2  name2  name1  other

循环 Pandas 列，同时计算包含两个指定值的行

问题描述投票：0回答：1

1个回答

最新问题

循环 Pandas 列，同时计算包含两个指定值的行

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1