如何从numpy.ndarray获得所有唯一单词？

Question

我有以下ndarray：X_train：[[，]]

array([['Boots new', 'Boots 46 size new'], ['iPhone 7 plus 128GB Red',
        '\xa0/\n/\n The price is only for Instagram subscribers'], ...],
      dtype=object)

我想获得所有唯一单词的列表。如何最快地做到这一点？谢谢您的任何帮助。

Answer 1

我不确定您是否关心标题和说明中的文字，因此这可以取自两者，但可以轻松进行修改。

如果要跟踪唯一的事物，通常不建议使用集合，因为它不允许您添加多个相同的元素。

此代码将在所有标题和说明中建立一组唯一的单词。我添加了忽略列表，以防万一您想忽略特殊词。如果需要，可以使用正则表达式使它更加复杂。

import numpy as np

arr = np.array([['Boots new', 'Boots 46 size new'], ['iPhone 7 plus 128GB Red',
                '\xa0/\n/\n The price is only for Instagram subscribers']],
                dtype=object)

words = set()
ignore = ["/", "7"]
for title, description in arr:
    words.update(set(word for word in title.strip().split() if word not in ignore))
    words.update(set(word for word in description.strip().split() if word not in ignore))

print(words)

此打印

{'price', 'Boots', 'subscribers', 'size', '46', 'Instagram', '128GB', 'new', 'plus', 'iPhone', 'is', 'only', 'for', 'The', 'Red'}

如何从numpy.ndarray获得所有唯一单词？

问题描述投票：0回答：1

1个回答

最新问题

如何从numpy.ndarray获得所有唯一单词？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1