我正在尝试将这段旧代码片段转换为与 gensim 的更新版本一致。我能够将 model.wv.vocab 转换为 model.wv.key_to_index,但模型 [model.wv.vocab] 以及如何转换它存在问题。
代码如下:
model = Word2Vec(corpus, min_count = 1, vector_size = 5 )
#pass the embeddings to PCA
X = model[model.wv.vocab]
pca = PCA(n_components=2)
result = pca.fit_transform(X)
#create df from the pca results
pca_df = pd.DataFrame(result, columns = ['x','y'])
我试过这个:
#pass the embeddings to PCA
X = model.wv.key_to_index
pca = PCA(n_components=2)
result = pca.fit_transform(X)
#create df from the pca results
pca_df = pd.DataFrame(result, columns = ['x','y'])
并不断出现错误。这是 model.wv.key_to_index 的样子:
{'the': 0,
'in': 1,
'of': 2,
'on': 3,
'': 4,
'and': 5,
'a': 6,
'to': 7,
'were': 8,
'forces': 9,
'by': 10,
'was': 11,
'at': 12,
'against': 13,
'for': 14,
'protest': 15,
'with': 16,
'an': 17,
'as': 18,
'police': 19,
'killed': 20,
'district': 21,
'city': 22,
'people': 23,
'al': 24,
'came': 996,
'donbass': 997,
'resulting': 998,
'financial': 999}
这段代码最终对我有用:
word_vectors = model.wv
# Accessing word vectors using the updated syntax
vectors = word_vectors.vectors
vocab = word_vectors.index_to_key
# Retrieving vectors for specific words (for instance, for the first 10 words)
selected_words = vocab[:10]
selected_vectors = [word_vectors[word] for word in selected_words]