如何在numpy.ndarray中标记？

Question

我有以下ndarray：X_train：[[，]]

array([['Boots new', 'Boots 46 size new'], ['iPhone 7 plus 128GB Red',
        '\xa0/\n/\n The price is only for Instagram subscribers'], ...],
      dtype=object)

所以，现在我需要标记标题和描述。我编写了以下函数：

from nltk.tokenize import WordPunctTokenizer


tokenizer = WordPunctTokenizer()
def preprocess(text: str) -> str:
    return ' '.join(tokenizer.tokenize(text.lower()))

问题是：如何更快，更有效地使用ndarray处理数据？我不想使用嵌套循环。可以用numpy快速实现。

我尝试过：

for row in X_train:
    row = [preprocess(x) for x in row]

但是，它没有改变，我得到：

array([['Boots new', 'Boots 46 size new'], ['iPhone 7 plus 128GB Red',
            '\xa0/\n/\n The price is only for Instagram subscribers'], ...],
          dtype=object)

但是我想要这个：

array([['boots new', 'boots 46 size new'], ['iphone 7 plus 128gb red',
                '/ / the price is only for instagram subscribers'], ...],
              dtype=object)

非常感谢您提供任何帮助。

Answer 1

我不确定速度。但是我使用map函数来完成此操作。

sentences=['Boots new', 'Boots 46 size new'], ['iPhone 7 plus 128GB Red',
        '\xa0/\n/\n The price is only for Instagram subscribers']

def lowr(s):
    return [s[0].lower(), s[1].lower()]
result = list(map(lowr, sentences))

print(result)

[['boots new', 'boots 46 size new'], ['iphone 7 plus 128gb red', '\xa0/\n/\n the price is only for instagram subscribers']]

正如@hpaulj在评论中提到的：lambda函数只是该函数的另一种语法。因此，如果您想使用lambda函数执行相同的操作，那么它将是：

result = list(map(lambda s: [s[0].lower(), s[1].lower()], sentences))
print(result)

输出是相同的：

[['boots new', 'boots 46 size new'], ['iphone 7 plus 128gb red', '\xa0/\n/\n the price is only for instagram subscribers']]

如何在numpy.ndarray中标记？

问题描述投票：0回答：1

1个回答

最新问题

如何在numpy.ndarray中标记？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1