在嵌套循环中使用 np.array_equal() 函数识别具有相同特征值的股票

Question

我想了解我的代码是否正常工作。

数据框 df2 是股票特征的垂直堆叠时间序列。

库存_id	log_target_vol_corr_32_clusters_stnd
1	0.4
1	0.8
1	0.7
2	0.3
2	0.4
2	0.0
3	0.4
3	0.8
3	0.7
4	0.9
4	0.9
4	0.1
5	0.9
5	0.9
5	0.1

请注意，股票（1 和 3）和（4 和 5）具有相同的特征值，因此我想将它们分组到一个集群中。最终，我想找到属于每个集群的所有股票 ID。

## find stock ids of clusters having same feature values
column = 'log_target_vol_corr_32_clusters_stnd'
remaining_stocks = df2['stock_id'].unique().astype(int)
clusters = {}
for s in remaining_stocks:
    print(s)
    clusters[s] = []
    a1 = df2[df2['stock_id'] == s ][column]
    remaining_stocks = np.delete(remaining_stocks,np.where(remaining_stocks==s))
    for s1 in remaining_stocks:
        a2 = df2[df2['stock_id'] == s1 ][column]
        if np.array_equal(a1,a2):
            print(s1)
            remaining_stocks = np.delete(remaining_stocks,np.where(remaining_stocks==s1))
            clusters[s].append(s1)
            print(remaining_stocks)

您能解释一下这段代码的错误是什么吗？

我编写了以下代码，似乎获得的数据超出了数据框中实际的簇数。

Answer 1

问题在于您在迭代数据时修改了数据！

试试这个：

import pandas as pd
import numpy as np


# Convert the feature values to a hashable type (e.g., tuple) and then to a string if exact match is necessary
df2['features_hash'] = df2.groupby('stock_id')[column].transform(lambda x: hash(tuple(x)))

# Now, group by this new hash and list stock_ids in each group
clustered_stocks = df2.groupby('features_hash')['stock_id'].unique()

# Convert the grouped object into a dictionary for easier handling
clusters = clustered_stocks.to_dict()

# If you need to, invert the dictionary so that stock_id is the key and cluster identifiers are the values
# This step might need adjustments based on how you want to use the clusters
clusters_by_stock_id = {}
for cluster_hash, stocks in clusters.items():
    for stock in stocks:
        clusters_by_stock_id[stock] = cluster_hash

在嵌套循环中使用 np.array_equal() 函数识别具有相同特征值的股票

问题描述投票：0回答：1

1个回答

最新问题

库存_id	log_target_vol_corr_32_clusters_stnd
1	0.4
1	0.8
1	0.7
2	0.3
2	0.4
2	0.0
3	0.4
3	0.8
3	0.7
4	0.9
4	0.9
4	0.1
5	0.9
5	0.9
5	0.1

库存_id	log_target_vol_corr_32_clusters_stnd
1	0.4
1	0.8
1	0.7
2	0.3
2	0.4
2	0.0
3	0.4
3	0.8
3	0.7
4	0.9
4	0.9
4	0.1
5	0.9
5	0.9
5	0.1

在嵌套循环中使用 np.array_equal() 函数识别具有相同特征值的股票

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1

库存_id	log_target_vol_corr_32_clusters_stnd
1	0.4
1	0.8
1	0.7
2	0.3
2	0.4
2	0.0
3	0.4
3	0.8
3	0.7
4	0.9
4	0.9
4	0.1
5	0.9
5	0.9
5	0.1