如何使用熊猫来截断句子的左右部分

问题描述 投票:0回答:1

将句子转换为单词列表,然后找到根字符串的索引应该做的事情:

sentence = "lack of association between the promoter polymorphism of the mtnr1a gene and adolescent idiopathic scoliosis"
root = "mtnr1a"

try:
    words = sentence.split()
    n = words.index(root)
    cutoff = ' '.join(words[n-4:n+5])
except ValueError:
    cutoff = None

print(cutoff)

结果:

promoter polymorphism of the mtnr1a gene and adolescent idiopathic

如何在pandas数据帧中使用它?

我尝试:

sentence = data['sentence'] 
root = data['rootword'] 
def cutOff(sentence,root): 
   try: 
      words = sentence.str.split() 
      n = words.index(root) 
      cutoff = ' '.join(words[n-4:n+5]) 
except ValueError: 
      cutoff = None 
      return cutoff 
data.apply(cutOff(sentence,root),axis=1)

但它不起作用......

编辑:

如何在根词后4个字符串后切句,当根词在句子中的第一个位置时,以及根词在句子中的最后位置时?例如:

sentence = "mtnr1a lack of association between the promoter polymorphism of the gene and adolescent idiopathic scoliosis"
out if root in first position:
"mtnr1a lack of association between"
out if root in last position:
"lack of association between the promoter polymorphism of the gene and adolescent idiopathic scoliosis"
"adolescent idiopathic scoliosis mtnr1a"
python pandas
1个回答
0
投票

代码中的两个小调整应该可以解决您的问题:

首先,在数据帧上调用apply()会将函数应用于调用它的DataFrame的每一行中的值。

您不必将列作为函数的输入传入,并且调用sentence.str.split()没有意义。在cutOff()函数内部,sentence只是一个常规字符串(不是列)。

将您的功能更改为:

def cutOff(sentence,root): 
    try: 
        words = sentence.split()  # this is the line that was changed
        n = words.index(root) 
        cutoff = ' '.join(words[n-4:n+5]) 
    except ValueError: 
        cutoff = None 
    return cutoff

接下来,您只需指定将作为函数输入的列 - 您可以使用lambda执行此操作:

df.apply(lambda x: cutOff(x["sentence"], x["rootword"]), axis=1)
#0    promoter polymorphism of the mtnr1a gene and a...
#dtype: object
© www.soinside.com 2019 - 2024. All rights reserved.