Python字符串匹配-查找单词列表中特定数量的单词是否存在于另一个列表中的句子中

Question

我有一个字符串和一个定义如下的列表

my_string = 'she said he replied'
my_list = ['This is a cool sentence', 'This is another sentence','she said hello he replied goodbye', 'she replied', 'Some more sentences in here', 'et cetera et cetera...']

我正在尝试检查my_string中的任何字符串中是否至少存在my_list中的3个单词。我采用的方法是拆分my_string，然后使用all进行匹配。但是，这仅在my_string中的所有项目都存在于my_list

的句子中时才有效

if all(word in item for item in my_list for word in my_string.split()):
    print('we happy')

1-如果句子列表中至少有3个my_string项目，如何满足条件？

2-是否可以以相同顺序仅匹配my_string中的第一个和最后一个字？即“她”和“已回复”出现在my_list索引3的“她已回复”中，返回True。

Answer 1

两个字符串之间的共同词可以使用设定的交集来计算。结果集的len为您提供字符串共有的单词数。

首先使用集合并集在my_list中的字符串中建立所有单词的集合：

all_words = set.union(*[set(item.split()) for item in my_list])

然后检查交叉点的长度是否为>= 3：

search_words = set(my_string.split())
if len(search_words & all_words) >= 3:
    print('we happy')

Answer 2

使用True为1的固有编码，False为0。对in结果的值求和：

if sum(word in item for item in my_list for word in my_string.split()) >= 3:
    print('we happy')

对于您给定的输入，将显示we happy。

Re：mamun的观点，我们还想确保整个单词都匹配。您需要分割my_list中的每个字符串以获取可用单词的列表。 kaya3已经发布了我要告诉您的操作。

Answer 3

关于第1部分，我认为这应该起作用，并且我建议使用正则表达式而不是string.split来查找单词。

import re

num_matches = 3

def get_words(input):
    return re.compile('\w+').findall(input)

my_string = 'she said he replied'
my_list = ['This is a cool sentence', 'This is another sentence','she said hello he replied goodbye', 'she replied', 'Some more sentences in here', 'et cetera et cetera...']

my_string_word_set = set(get_words(my_string))
my_list_words_set = [set(get_words(x)) for x in my_list]

result = [len(my_string_word_set.intersection(x)) >= num_matches for x in my_list_words_set]
print(result)

结果]

[[False，False，True，False，False，False]
对于第2部分，虽然这不是一个超级干净的解决方案，但类似的东西应该可以工作。

words = get_words(my_string)
first_and_last = [words[0], words[-1]]
my_list_dicts = []
for sentence in my_list:
    word_dict = {}
    sentence_words = get_words(sentence)
    for i, word in enumerate(sentence_words):
        word_dict[word] = i
    my_list_dicts.append(word_dict)

result2 = []
for word_dict in my_list_dicts:
    if all(k in word_dict for k in first_and_last) and word_dict[first_and_last[0]] < word_dict[first_and_last[1]]:
        result2.append(True)
    else:
        result2.append(False)

print(result2)
结果：

[[False，False，True，True，False，False]

Python字符串匹配-查找单词列表中特定数量的单词是否存在于另一个列表中的句子中

问题描述投票：0回答：3

3个回答

最新问题

Python字符串匹配-查找单词列表中特定数量的单词是否存在于另一个列表中的句子中

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3