如果字符串包含停用词，请从字符串中删除元素[复制]

Question

这个问题在这里已有答案：

How to remove items from a list that contains words found in items in another list [duplicate] 4个答案

我有一个列表如下：

lst = ['for Sam', 'Just in', 'Mark Rich']

我试图从字符串列表中删除一个元素（字符串包含一个或多个单词），其中包含stopwords。

由于列表中的第1和第2个元素包含for和in，它们将返回stopwords，它将返回

new_lst = ['Mark Rich']

我尝试了什么

from nltk.corpus import stopwords

stop_words = set(stopwords.words('english'))

lst = ['for Sam', 'Just in', 'Mark Rich']
new_lst = [i.split(" ") for i in lst]
new_lst = [" ".join(i) for i in new_lst for j in i if j not in stop_words]

这给了我输出为：

['for Sam', 'Just in', 'Mark Rich', 'Mark Rich']

Answer 1

你需要一个if语句而不是额外的嵌套：

new_lst = [' '.join(i) for i in new_lst if not any(j in i for j in stop_words)]

如果你想使用set，你可以使用set.isdisjoint：

new_lst = [' '.join(i) for i in new_lst if stop_words.isdisjoint(i)]

这是一个演示：

stop_words = {'for', 'in'}

lst = ['for Sam', 'Just in', 'Mark Rich']
new_lst = [i.split() for i in lst]
new_lst = [' '.join(i) for i in new_lst if stop_words.isdisjoint(i)]

print(new_lst)

# ['Mark Rich']

Answer 2

您可以使用列表推导并使用sets检查两个列表中的任何单词是否相交：

[i for i in lst if not set(stop_words) & set(i.split(' '))]
['Mark Rich']]

如果字符串包含停用词，请从字符串中删除元素[复制]

问题描述投票：1回答：2

2个回答

最新问题

如果字符串包含停用词，请从字符串中删除元素[复制]

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2