如何从特定列切片集群

Question

df['clusters']包含4簇0,1,2,3的数据框。我使用csv读取了pandas格式的数据框，并实现了k-means聚类并生成了4个聚类。集群在df['clusters']。假设有4个带有标签0,1,2,3的簇。现在如何切片一列并获取属于群集1的列

Answer 1

我看不出你究竟是什么问题 - 用df[df['clusters'] == 3]它可以正常工作：

import pandas as pd

# dummy data:
df = pd.DataFrame({'a': [1, 2, 3, 8, 9], 'b': [3, 4, 5, 11, 2], 'clusters':[0,2,3,3,1]})

print(df)
# result:
   a   b  clusters
0  1   3         0
1  2   4         2
2  3   5         3
3  8  11         3
4  9   2         1

print(df[df['clusters'] == 3])  
# result:
   a   b  clusters
2  3   5         3
3  8  11         3

想要放弃（现在不必要的）clusters专栏吗？

df_3 = df[df['clusters'] == 3].drop(['clusters'], axis=1) # cluster #3
print(df_3)
# result
   a   b
2  3   5
3  8  11

更新（评论后）：从a切片列df_3：

df_3_a = df_3.loc[:, ['a']]
print(df_3_a)
# result:
   a
2  3
3  8

那么，从最初的df开始，然后选择a为cluster==3：

df_3_a = df[df['clusters'] == 3].drop(['clusters'], axis=1).loc[:,['a']]
print(df_3_a_)
# result:
   a
2  3
3  8

Answer 2

无法访问您的数据框，我建议您将数据转换为numpy数组

df_array = df.to_numpy(copy=True)

然后：

df_clustered = df_array[df_array[:,cluster_data_col]==cluster_type]

其中cluster_data_col是存储聚类结果的列号，cluster_type是四个聚类中的任何一个。

如何从特定列切片集群

问题描述投票：0回答：2

2个回答

最新问题

如何从特定列切片集群

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2