基于python中的过渡矩阵权重选择单词

Question

我正在尝试使用当前单词对出现作为“权重”，根据当前单词选择可能的下一个单词。我在下一个单词的实际选择中难以实现np.random.choice()。

import pandas as pd
import numpy as np

texty = "won't you celebrate with me what i have shaped into a kind of life i had no model born in babylon both nonwhite and woman what did i see to be except myself i made it up here on this bridge between starshine and clay my one hand holding tight my other hand come celebrate with me that everyday
something has tried to kill me and has failed." 

# https://www.poetryfoundation.org/poems/50974/wont-you-celebrate-with-me

words = texty.split()

# Creating the text-based transition matrix

x = pd.crosstab(pd.Series(words[1:],name='next'),
            pd.Series(words[:-1],name='word'),normalize=1)

print(x)

# Selecting the next word based on the current word.
# https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.choice.html

current = np.random.choice(set(texty)) # added to select a current word

next = np.random.choice(current,1,current) # was "y"

我不知道如何从这里引用转换矩阵。我希望此选择基于先前发生的概率。例如，“和”之后出现“粘土”的概率为33％。

Answer 1

时无法实现np.random.choice（）

x是熊猫DataFrame。

您可以访问该DataFrame的任何列，就像列名是字典中的键一样。

> print(x['won\'t'])
next
a            0.0
and          0.0
babylon      0.0
...
with         0.0
woman        0.0
you          1.0
Name: won't, dtype: float64

该列返回为熊猫Series。如果您从DataFrame中选择一列（您的转换矩阵x），则所选系列的index将是文本中的可用单词，而values将是其关联的概率。您可以将这些提供给np.random.choice，以获得下一个单词，并从过渡矩阵中加权出概率。

> current_word = 'won\'t'
> current_column = x[current_word]
> next_word = np.random.choice(current_column.index,
                 p=current_column.values)
> print(next_word)
you

基于python中的过渡矩阵权重选择单词

问题描述投票：0回答：1

1个回答

最新问题

基于python中的过渡矩阵权重选择单词

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1