如何从主字符串主体中删除引用列表中的字符

Question

我正在处理一列df文本，并且我试图计算频率最高的单词，但是偏离某些单词，例如“ for”，“ and”，“ the” .. etc等。主导结果。我试图创建一个for循环来删除这些单词，以免在我的分析中造成干扰。下面是我正在生成的代码；

 lst= ["for", "of", "and", "in", "which", "the", "to", "a", "an"]


for i in papers.title_processed:
    if i in lst:
        papers.title_processed=  papers.title_processed.replace(i, "")


output: 
0    Self-Organization of Associative Database and ...
1    A Mean Field Theory of Layer IV of Visual Cort...
2    Storing Covariance by the Associative Long-Ter...
3    Bayesian Query Construction for Neural Network...
4    Neural Network Ensembles, Cross Validation, an...
Name: title, dtype: object
0    self-organization of associative database and ...
1    a mean field theory of layer iv of visual cort...
2    storing covariance by the associative long-ter...
3    bayesian query construction for neural network...
4    neural network ensembles, cross validation, an...
Name: title_processed, dtype: object

所以它什么也没做。有什么建议我做错了吗？我试过.map(lambda x: papers.title_processed.str.replace(x, "") for x in lst)并出现错误

Answer 1

用途：

import re

lst= ["for", "of", "and", "in", "which", "the", "to", "a", "an"]

regex = re.compile('|'.join([rf'\b{w}\b' for w in lst]))
papers['title_processed'] = papers['title_processed'].str.replace(regex, '')

从lst中删除单词后，title_processed系列应如下所示：

# print(papers['title_processed'])

0       self-organization  associative database  ...
1        mean field theory  layer iv  visual cort...
2     storing covariance by  associative long-ter...
3     bayesian query construction  neural network...
4    neural network ensembles, cross validation, ...
Name: title_processed, dtype: object

如何从主字符串主体中删除引用列表中的字符

问题描述投票：0回答：1

1个回答

最新问题

如何从主字符串主体中删除引用列表中的字符

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1