使用 numpy 从矩阵中选择元素

Question

我尝试使用 numpy 来快速进行文本分析。确切地说是搭配分析。假设我有以下字符串，并将其转换为 numpy 数组：

text = np.array(['a', 'b', 'c', 'd', 'e', 'b', 'f', 'g'])

假设我想从该数组中获取字母“b”的左右上下文。假设左侧有 1 个元素，右侧有 2 个元素。所以我想要这样的东西：

['a', 'c', 'd'] +  ['e', 'f', 'g']

是否可以使用 Numpy 广播所有操作来做到这一点？我只是循环播放文本，但这非常耗时。

我尝试了 np.select、np.where 和 np.mask

感谢您的帮助:)

Answer 1

一种可能的方法是找到

值索引（使用

np.where(arr == 'b')

）以进一步索引相邻值：

arr = np.array(['a', 'b', 'c', 'd', 'e', 'b', 'f', 'g'])
lr_contexts = [arr[[i-1, i+1, i+2]] for i in np.where(arr == 'b')[0]]
print(lr_contexts)

[array(['a', 'c', 'd'], dtype='<U1'), array(['e', 'f', 'g'], dtype='<U1')]

Answer 2

我相信如果你真的想使用 numpy，前面的答案是正确的选择。但如果适用，我建议您尝试在文本模式任务中使用正则表达式功能。对于此任务，以下函数将使用

re

包来解决它。

import re

def get_text_around_char(text, char, n_left, n_rigth):
    matches = []
    for match in re.finditer(char, text):
        s, e = match.start(), match.end()
        matches.append(text[s-n_left:s]+text[s+1:e+n_rigth]) 
    return matches

print(get_text_around_char("abcdebfg", "b", 1, 2))

['acd', 'efg']

使用 numpy 从矩阵中选择元素

问题描述投票：0回答：2

2个回答

最新问题

使用 numpy 从矩阵中选择元素

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2