通过值将数组索引到最接近的非整数步长？

Question

如果我没有正确表达这一点，我深表歉意。我有一个 ~10^7 单调递增元素的数组：

example_array = np.array([10, 10.5, 13, 15, 20, 35, 37.1, 40, 50])

我想找到索引，以便索引原始数组产生最接近元素之间的线性间距（无需显式插值）。换句话说，我希望索引能够以最接近元素之间的线性间距对

example_array

进行重新采样。因此，对于

example_array

和

10.5

的间距：

output_index = np.array([1, 4, 5, 7, 8])
print(example_array[output_index]) # [10.5, 20. , 35. , 40. , 50. ]

如何通过 numpy 运算快速实现这一目标？

我本以为取模是第一步：

In [1]: import numpy as np
In [2]: example_array = np.array([10, 10.5, 13, 15, 20, 35, 37.1, 40, 50])
In [3]: spacing = 10.5
In [4]: example_array/spacing
Out[4]: array([0.95238095, 1.        , 1.23809524, 1.42857143, 1.9047619 ,
       3.33333333, 3.53333333, 3.80952381, 4.76190476])
In [7]: example_array%spacing
Out[7]: array([10. ,  0. ,  2.5,  4.5,  9.5,  3.5,  5.6,  8.5,  8. ])

但我似乎无法在这里做出区分。实际上，我想要 Out[5] 最接近每个整数的索引一次；即 (1, 1.9047619, 3.33333333, 3.80952381, 4.76190476)。

::编辑:: 正如所指出的，我错过了一个，现在插入它。在某种程度上也类似于在两个 numpy 数组中查找最接近的值

Answer 1

我不确定矢量化解决方案在这里是否更好。您试图最小化数组中元素与

spacing

参数的整数倍之间的距离，但每个整数都需要一个对应的解决方案，因此您无法设置全局阈值来查找“接近”条目和唯一的条目一次最小化多个点的方法是创建一个大小为

(N,n)

的二维数组，其中

是原始数组中的条目数，

是您要最小化的点的数量，然后最小化二维数组的每一行。这会很快破坏你的记忆，不推荐。该方法看起来像这样：

def nearest_integers_vec(arr, spacing):
    nmin = int(arr[0]//spacing)
    nmax = int(arr[-1]//spacing)+1
    arr2 = np.tile(arr, (nmax-nmin, 1))
    arr3 = abs(arr2 - np.array([np.arange(nmin+1, nmax+1)*spacing]).T)
    return np.argmin(arr3, axis = 1)

In [1]: idx = nearest_integers_vec(example_array, spacing)
In [2]: example_array[idx]
Out[2]: array([10.5, 20. , 35. , 40. , 50. ])

首选方法是放弃矢量化，以实现 Python 更快的循环机制之一，例如

map()

，然后转换为索引列表（如果需要，然后将输出转换为数组）：

def nearest_integers(arr, spacing):
    nmin = int(arr[0]//spacing)
    nmax = int(arr[-1]//spacing) + 1
    return list(map(lambda i: np.argmin(abs(arr - i*spacing)), range(nmin+1, nmax+1)))

产生相同的结果（列表形式除外，尽管这仍然适用于数组索引）

Answer 2

这个问题看起来没那么简单。

我尝试使用图论。我们可以首先计算所有差异，减去预期步长的一半并计算模步长。这将为您提供从一个值到另一个值的所有潜在跳跃的列表，以您的步骤的倍数表示。

然后你可以从中构建一个图表并尝试找到一条路径：

example_array = np.array([10, 10.5, 13, 15, 20, 35, 40, 50])

# define expected step
N = 10.5

# compute pairwise differences
d = abs(example_array[:, None] - example_array)

# subtracting mid-step and modulo step
# this enables to have intervals close to a multiple of the step
# as a value close to N/2
d2 = (d-N/2)%N

# identify intervals close to a multiple of the step
# but greater than 0
# using a relative tolerance of 15%
# and absolute tolerance of 0.5 for values close to each other
# tweak this as desired
m = np.where(np.isclose(d, 0, atol=0.5),
             False, np.isclose(d2, N//2, rtol=0.15)
            )

# building the graph
import networkx as nx
G = nx.from_numpy_array(np.triu(m), create_using=nx.DiGraph)
G.remove_edges_from(nx.selfloop_edges(G))

# getting longest path
nx.dag_longest_path(G)

输出：

[0, 4, 6, 7]

图表：

通过值将数组索引到最接近的非整数步长？

问题描述投票：0回答：2

2个回答

最新问题

通过值将数组索引到最接近的非整数步长？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2