我正在尝试使用当前单词对出现作为“权重”,根据当前单词选择可能的下一个单词。我在下一个单词的实际选择中难以实现np.random.choice()
。
import pandas as pd
import numpy as np
texty = "won't you celebrate with me what i have shaped into a kind of life i had no model born in babylon both nonwhite and woman what did i see to be except myself i made it up here on this bridge between starshine and clay my one hand holding tight my other hand come celebrate with me that everyday
something has tried to kill me and has failed."
# https://www.poetryfoundation.org/poems/50974/wont-you-celebrate-with-me
words = texty.split()
# Creating the text-based transition matrix
x = pd.crosstab(pd.Series(words[1:],name='next'),
pd.Series(words[:-1],name='word'),normalize=1)
print(x)
# Selecting the next word based on the current word.
# https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.choice.html
current = np.random.choice(set(texty)) # added to select a current word
next = np.random.choice(current,1,current) # was "y"
我不知道如何从这里引用转换矩阵。我希望此选择基于先前发生的概率。例如,“和”之后出现“粘土”的概率为33%。
x是熊猫DataFrame。
您可以访问该DataFrame的任何列,就像列名是字典中的键一样。
> print(x['won\'t'])
next
a 0.0
and 0.0
babylon 0.0
...
with 0.0
woman 0.0
you 1.0
Name: won't, dtype: float64
该列返回为熊猫Series。如果您从DataFrame中选择一列(您的转换矩阵x
),则所选系列的index
将是文本中的可用单词,而values
将是其关联的概率。您可以将这些提供给np.random.choice
,以获得下一个单词,并从过渡矩阵中加权出概率。
> current_word = 'won\'t'
> current_column = x[current_word]
> next_word = np.random.choice(current_column.index,
p=current_column.values)
> print(next_word)
you