我需要计算 np.array 的运行总和。假设
a
是给定的长度为 l
的 np.array,并且 n
是一个整数步长,那么我需要一个长度为 b
的数组 l - n + 1
,其中
b= [a[0]+a[1]+...+a[n-1], a[1]+a[2]+...+a[n], ..., a[ l-n]+a[l-n+1]+...+a[l-1]]
我尝试了以下方法:
def run_sum(a, n):
a_cumsum = np.r_[0, np.cumsum(a)]
b = a_cumsum[n:] - a_cumsum[:-n]
return b
它工作得很好,但是如果数组
a
有非常不同的元素(例如其中一些是1e8
,其中一些是1e-8
),就像我的情况一样,那么这个方法会给出太多的错误,这些错误会累积在中np.cumsum
。您对如何直接执行此操作而不使用 np.cumsum
有什么想法吗?
由于
a
有长度 int(1e8)
我不想使用 for
循环。想要类似的东西:
ind = np.arange(a.size - n + 1)
b[ind] = np.sum(a[ind:ind+n])
但它不起作用(
更新,精度损失的示例:
a = np.array([-9999.999966666666, -5.7847991744735696e-08, -2.983181169098924e-08, -2.093995193995538e-08, -1.6894298480290635e-08, -1.4869270858640853e-08, -1.3961300790320832e-08, -1.3834651535607362e-08, -1.4395718046705324e-08, -1.570696320519037e-08])
run_sum(a, 3) =
np.array([-9999.999966754345, -1.0861913324333727e-07, -6.766640581190586e-08, -5.2703398978337646e-08, -4.5723936636932194e-08, -4.2664396460168064e-08, -4.219145921524614e-08, -4.393768904265016e-08])
a[:-2] + a[1:1] + a[2:] =
np.array([-9999.999966754345, -1.0861975537568031e-07, -6.766606211123524e-08, -5.2703521278886865e-08, -4.572487012925232e-08, -4.266522318456905e-08, -4.2191670372633516e-08, -4.393733278750305e-08])
第 5 个数字的差异对我来说很重要!
更新2: 认为这会起作用。至少这个玩具示例处理得很好。这个想法是将初始数组拆分为具有大元素和小元素的数组,并独立计算 cumsum。
def run_sum_new(a, n, max_exp_sep=3, exp_sep_lim=4):
"""
Finds the running sum of n consecutive elements of array a. If exponents of array elements
vary greatly, divides them into different arrays to minimize the final error.
Receives:
a - the original array
n - number of elements to sum
max_exp_sep - the maximum number of arrays the original array can be divided into (regardless for positive and negative elements).
In total, the maximum number of arrays is 2*max_exp_sep+1
exp_sep_lim - the minimum of the difference between the exponents of two elements at which they must be separated.
Returns:
run_sum - array of running sum of the original array
"""
#finds maximal exponents of pos and neg elements and minimal exponent of elements of a
a_pos = a[a>0]
a_neg = a[a<0]
if a_pos.size == 0:
a_pos_max = np.nan
else:
a_pos_max = np.max(a_pos)
if a_neg.size == 0:
a_neg_max = np.nan
else:
a_neg_max = np.max(np.abs(a_neg))
a_min = np.min(np.abs(a))
exp_pos_max = np.ceil(np.log10(a_pos_max))
exp_neg_max = np.ceil(np.log10(a_neg_max))
exp_min = np.floor(np.log10(a_min))
del(a_pos)
del(a_neg)
#finds separation limits list
d_exp_pos = exp_pos_max - exp_min
if np.isnan(d_exp_pos):
exp_pos_sep_list = []
elif d_exp_pos <= exp_sep_lim * (max_exp_sep + 1):
exp_pos_sep_list = [10**(exp_min + exp_sep_lim * i) for i in range(1, max_exp_sep+1) if 10**(exp_min + exp_sep_lim * i) < a_pos_max]
else:
new_exp_sep_lim = np.ceil(d_exp_pos / (max_exp_sep + 1))
print(f"Warning: positive elements of array have too large exponent spread {d_exp_pos} for the given max_exp_sep={max_exp_sep} and exp_sep_lim={exp_sep_lim}.",
f"New exp_sep_lim={new_exp_sep_lim} has been taken.", sep='\n')
exp_pos_sep_list = [10**(exp_min + new_exp_sep_lim * i) for i in range(1, max_exp_sep+1)]
d_exp_neg = exp_neg_max - exp_min
if np.isnan(d_exp_neg):
exp_neg_sep_list = []
elif d_exp_neg <= exp_sep_lim * (max_exp_sep + 1):
exp_neg_sep_list = [-10**(exp_min + exp_sep_lim * i) for i in range(1, max_exp_sep+1) if 10**(exp_min + exp_sep_lim * i) < a_neg_max]
else:
new_exp_sep_lim = np.ceil(d_exp_neg / (max_exp_sep + 1))
print(f"Warning: negative elements of array have too large exponent spread {d_exp_neg} for the given max_exp_sep={max_exp_sep} and exp_sep_lim={exp_sep_lim}.",
f"New exp_sep_lim={new_exp_sep_lim} has been taken.", sep='\n')
exp_neg_sep_list = [10**(exp_min + new_exp_sep_lim * i) for i in range(1, max_exp_sep+1)]
exp_neg_sep_list.reverse()
exp_sep_list = [-np.inf, ] + exp_neg_sep_list + exp_pos_sep_list + [np.inf,]
#separates a
a_sep_list = [np.where((i <= a) & (a < j), a, 0) for (i, j) in zip(exp_sep_list[:-1], exp_sep_list[1:])]
a_sep_arr = np.array(a_sep_list)
#finds run sum
a_sep_cumsum = np.cumsum(a_sep_arr, axis=-1)
a_sep_cumsum = np.hstack(([[0]]*a_sep_cumsum.shape[0], a_sep_cumsum))
run_sum = np.sum(a_sep_cumsum[:,n:] - a_sep_cumsum[:,:-n], axis=0)
return run_sum
run_sum_new(a, 3) = np.array([-9999.999966754345, -1.0861975537568031e-07, -6.766606211123526e-08, -5.270352127888686e-08, -4.5724870129252317e-08, -4.266522318456904e-08, -4.2191670372633516e-08, -4.393733278750304e-08])
a[:-2] + a[1:1] + a[2:] =
np.array([-9999.999966754345, -1.0861975537568031e-07, -6.766606211123524e-08, -5.2703521278886865e-08, -4.572487012925232e-08, -4.266522318456905e-08, -4.2191670372633516e-08, -4.393733278750305e-08])
timeit.timeit('run_sum_new(a, 3)', number=1000, globals=globals()) = 0.12470354699999997
sliding_window_view
:
from numpy.lib.stride_tricks import sliding_window_view as swv
b = swv(a, n).sum(axis=1)
这给出了与你的函数非常相似的结果,它似乎并不缺乏精度:
# generate an input with a large variability
np.random.seed(0)
a = 1/np.random.random(size=100_000_000)*1e-8
a.min(), a.max()
# (1.0000000074212765e-08, 0.9650719974971247)
b1 = run_sum(a, n)
b2 = swv(a, n).sum(axis=1)
# are the two outputs almost equal?
np.allclose(b1, b2)
# True
# what is the greatest difference?
np.abs(b2-b1).max()
# 7.059628683189656e-15
但速度相当慢:
# run_sum(a, n)
895 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# swv(a, n).sum(axis=1)
2.29 s ± 93.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
这是我建议的想法:有一条快速路径和一条准确路径,并根据快速路径是否会产生正确的答案在它们之间进行选择。
实现这样的求和的一种简单快速的方法是获取上一步中使用的值,减去将从窗口中删除的元素,然后添加正在进入窗口的元素。
但是,数组中的元素相差多个数量级,这将导致精度损失,因为从运行总和中减去大元素并不能完全删除它。
因此,我建议跟踪结果相对于窗口中的数字求和可能有多少误差,并在这些情况下计算总和。 (我称之为“重新启动”运行总和,因为它消除了之前步骤中积累的错误。)
我还使用了 Numba,这样我就可以编写一个高效的循环。
示例:
import numpy as np
import numba as nb
a = np.array([-9999.999966666666, -5.7847991744735696e-08, -2.983181169098924e-08, -2.093995193995538e-08, -1.6894298480290635e-08, -1.4869270858640853e-08, -1.3961300790320832e-08, -1.3834651535607362e-08, -1.4395718046705324e-08, -1.570696320519037e-08])
@nb.njit()
def run_sum_restart(a, n, rtol):
eps = np.finfo(a.dtype).eps
sum_ = a[:n].sum()
out = np.zeros(len(a) - n + 1)
out[0] = sum_
addptr = n
subptr = 0
err = 0
for i in range(1, len(out)):
sum_ += a[addptr] - a[subptr]
# Calculate max error for a[addptr] - a[subptr]
err += (max(abs(a[addptr]), abs(a[subptr]))) * eps
# Calculate max error for adding that to sum
err += abs(sum_) * eps
addptr += 1
subptr += 1
if err > rtol * abs(sum_):
# Restart sum
new_sum = a[subptr:addptr].sum()
# print("Max relative error at ", err / sum_)
# print("Restarted, found error of", (new_sum - sum_) / sum_)
sum_ = new_sum
err = 0
out[i] = sum_
return out
print(run_sum_restart(a, 3, rtol=1e-5))
输出:
[-9.99999997e+03 -1.08619755e-07 -6.76660621e-08 -5.27035213e-08
-4.57248701e-08 -4.26652232e-08 -4.21916704e-08 -4.39373328e-08]
@mozway 基准测试的基准时间:
Original
634 ms ± 5.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Restart sum version
582 ms ± 9.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
所以这既更准确又更快。