如何检查我的光谱是否有一个对称峰,即高斯峰

问题描述 投票:0回答:1

例如有2个光谱:

频谱 15 有一个近对称的峰值,并且具有近高斯形状,因此它是一个信号

频谱 371 根本没有峰值,应被视为噪声

我知道我可以对每个光谱进行高斯拟合并检查误差是否很小,但我有〜1000个光谱要检查,并且这种方法非常慢

是否有任何汇总统计数据可供我计算并判断频谱是否具有峰值信号(它将以任何速度出现)或噪声

例如信号是

k>0
,噪声是
k<0

enter image description here

enter image description here

python statistics
1个回答
0
投票

您已经有了检查数据是否适合给定曲线的正确想法,但它可以非常有效地完成。使用匹配过滤器的运行时间为 <100 microseconds on a graph with 512 points is attainable. The solution I present here has an approx 98% accuracy in identifying the signal and fails when the signal is very narrow (in this case when its less than about 5 samples) or very wide (in this case when it occupies most of the window). If this is common, the matched filter can be adjusted to compensate.

方法:

  • 生成您想要在数据中匹配的过滤器。在本例中,我使用了高斯密度函数。
  • 修剪输入数据以消除一开始那些奇怪的值
  • 使用
    numpy.convolve
    应用匹配的过滤器
  • 使用
    scipy.signal.find_peaks
    确定峰值响应在哪里
  • 选择最高峰
  • 检查该峰值是否高于某个阈值

这种方法有一个额外的好处,它还可以高精度地告诉您峰值发生的位置(通常是 <1% error).

这是一个示例实现:

import numpy as np
import matplotlib.pyplot as plt
import scipy


def get_normal_density_fun(x_data: np.ndarray, center: float = 0.0, std: float = 1.0, amp: float = 1.0):
    # https://stats.stackexchange.com/a/143633
    return amp / (0.4 / std) * (1 / (std * np.sqrt(2 * np.pi))) * np.exp(-((x_data - center) ** 2) / (2 * std ** 2))


def make_data(n: int, peak_location: float | None, rng: np.random._generator.Generator = None) -> tuple[np.ndarray, np.ndarray]:
    # This function is an attempt to recreate the data shown in your graph upload, not required during typical use
    NOISE_ABS = 0.1
    MAX_CENTER = 1.0
    X_MIN = 100
    X_MAX = 190
    INIT_SLOPE_END_X = 115
    INIT_SLOPE_END_Y = -0.05

    if rng is None:
        rng = np.random.default_rng(42)

    slope_end_index = int(n / (X_MAX - X_MIN) * (INIT_SLOPE_END_X - X_MIN))
    slope_start_val = rng.random() * 0.8 + 0.2

    out_data_x = np.linspace(X_MIN, X_MAX, n)
    out_data = rng.random(size=(n,)) * NOISE_ABS

    out_data[:slope_end_index] += np.linspace(start=slope_start_val, stop=INIT_SLOPE_END_Y, num=slope_end_index)
    if peak_location is not None:
        std = (peak_location - INIT_SLOPE_END_X) * rng.random()
        amp = MAX_CENTER * (0.25 + 0.75 * rng.random())
        out_data += get_normal_density_fun(out_data_x, peak_location, std, amp)
    return out_data_x, out_data


def find_response(x: np.ndarray, y: np.ndarray, plot_all: bool = False):
    # This is the filter we are trying to match the signal filter to
    #   Adjust std and amp to fit
    filter = get_normal_density_fun(np.linspace(-15, 15, 50), std=8, amp=5.0)
    y_trimmed = y.copy()
    # Due to some strange values at the beginning, 0 out the beginning
    y_trimmed[:len(y) // 8] = 0.0
    # https://dsp.stackexchange.com/a/94212
    matched_filter_output = np.convolve(y_trimmed, np.flip(filter), mode='same')
    # Use scipy to find all the peaks in the filter response
    peaks, properties = scipy.signal.find_peaks(matched_filter_output)
    # This block is for testing purposes, to understand how the system is working
    if plot_all:
        def scale(vec: np.ndarray, min_val: float = 0.0, max_val: float = 1.0, ref_vec: np.ndarray = None):
            if ref_vec is None:
                ref_vec = vec
            return (vec - np.min(ref_vec)) / (np.max(ref_vec) - np.min(ref_vec)) * (max_val - min_val) + min_val

        plt.plot(x, scale(matched_filter_output), label='filter response')
        plt.plot(x, scale(y), label='orig data')
        plt.plot(x, scale(y_trimmed, ref_vec=y), label='trimmed data')
        plt.legend()
        plt.show()
    # Capture the prominence of each peak (how much does the peak stick up)
    prominences, _, _ = scipy.signal.peak_prominences(matched_filter_output, peaks)
    # Find the maximum prominence (which hopefully matches our signal)
    max_prom_index = np.argmax(prominences)
    # Find the x-value that the peak occurred at
    main_peak = peaks[max_prom_index]
    return x[main_peak], prominences[max_prom_index]


def _main():
    import sklearn.metrics
    import time
    rng = np.random.default_rng(seed=42)

    n = 512
    tests = 2 ** 10
    peak_truth = []
    peak_pred = []
    peak_error = []
    peak_error_percent = []
    tot_time = 0.0
    show_mismatches = False
    for has in [True, False]:
        first_run = True
        for i in range(tests // 2):
            if has:
                peak = rng.random() * 50 + 120
            else:
                peak = None
            x, y = make_data(n, peak, rng)
            if first_run:
                first_run = False
                find_response(x, y, True)
            tot_time -= time.perf_counter()
            xprom, prom = find_response(x, y)
            has_guess = prom > 30
            tot_time += time.perf_counter()

            # print(f'{peak if has else 0.0:8.1f} - {xprom:8.1f} : {prom:8.5f}')
            peak_truth.append(has)
            peak_pred.append(has_guess)
            if has:
                peak_error.append(xprom - peak)
                peak_error_percent.append((xprom - peak) / peak * 100)
            if show_mismatches and not has_guess == has:
                plt.plot(x, y)
                plt.show()
    print(f'Run time / prediction: {tot_time / tests * 1000:.2f} ms')
    total_correct = np.sum(np.array(peak_truth) == np.array(peak_pred))
    total_wrong = np.sum(np.array(peak_truth) != np.array(peak_pred))
    print(f'Correct / Wrong (%)  : {total_correct} / {total_wrong} ({total_correct / len(peak_truth) * 100:.1f}%)')
    print(f'Mean error (%)       : {np.mean(np.abs(peak_error)):.2f} ({np.mean(np.abs(peak_error_percent)):.2f}%)')
    print(f'Max / min error      : {np.max(np.abs(peak_error)):.2f} / {np.min(np.abs(peak_error)):.2f}')
    # https://stackoverflow.com/a/51477080
    plt.hist(peak_error, bins=100, weights=np.ones(len(peak_error)) / len(peak_error) * 100)
    plt.xlabel('Error'); plt.ylabel('% of occurrances'); plt.title('Absolute Error')
    plt.show()
    cm = sklearn.metrics.confusion_matrix(peak_truth, peak_pred).astype(float)
    for row in range(cm.shape[0]):
        cm[row] /= np.round(np.sum(cm[row]),3)
    disp = sklearn.metrics.ConfusionMatrixDisplay(confusion_matrix=cm * 100)
    disp.plot()
    plt.show()


if __name__ == '__main__':
    _main()

这会产生以下控制台:

Run time / prediction: 0.05 ms
Correct / Wrong (%)  : 1004 / 20 (98.0%)
Mean error (%)       : 0.77 (0.50%)
Max / min error      : 49.64 / 0.00

在下面的图中,第一个和第二个应该分别与问题中的第一个和第二个类似:

signal present graph

signal not present graph

histogram of errors

confusion matrix

在本例中,使用我使用的值,它每次都正确预测没有信号,并且 96% 的时间预测存在信号。

在使用时,您可以调整

filter = get_normal_density_fun( ...
线上的值以匹配您的数据,这些值与我生成的玩具数据匹配。您可能还需要将行
has_guess = prom > 30
调整为与您的值匹配的数字。该测试块中注释掉的打印行显示了突出部分,因此您可以在实际数据上运行它以查看典型值并以此方式进行调整。

此功能的典型用例可能类似于:

for fname in filelist:
    filedatax, filedatay = loadfile(fname)
    xprom, prom = find_response(filedatax, filedatay)
    if prom > 30:
        process_signal(filedatax, filedatay, xprom)

如果您有任何疑问,请告诉我!

参考资料:

© www.soinside.com 2019 - 2024. All rights reserved.