使用 numpy.vectorize 泛化高斯混合以获取任意数量的参数会导致性能问题

问题描述 投票:0回答:1

我正在使用最大似然估计来优化高斯混合。最初我使用以下模型:

def normal(x, mu, sigma):
    """
    Gaussian (normal) probability density function.
    
    Args:
        x (np.ndarray): Data points.
        mu (float): Mean of the distribution.
        sigma (float): Standard deviation of the distribution.
    
    Returns:
        np.ndarray: Probability density values.
    """
    return (1 / (np.sqrt(2 * np.pi) * sigma)) * np.exp(-0.5 * ((x - mu) / sigma) ** 2)

def model(x, a, mu1, s1, mu2, s2):
    return a*normal(x, mu1, s1) + (1-a)*normal(x, mu2, s2)

效果很好,在一秒钟内就可以找到合适的位置。 我现在想为任意数量的峰值动态生成这样的函数。

def generate_gaussian_mix(n):
    def gaussian_mix(x, *params):
    
        if len(params) != 3 * n - 1:
            print(params)
            raise ValueError(f"Expected {3 * n - 1} parameters, but got {len(params)}.")

        params = np.asarray(params)
        mu = params[0::3]  # Means
        sigma = params[1::3]  # Standard deviations
        a = params[2::3]  # Weights
        a = np.hstack((a, 1 - np.sum(a)))

        return np.sum((a / (np.sqrt(2 * np.pi) * sigma))*np.exp(-0.5 * ((x - mu) / sigma) ** 2))

    return np.vectorize(gaussian_mix)

在我的笔记本电脑上,该模型需要三分钟多的时间才能计算出相同数量的峰值和事件。我可以采取哪些优化步骤来减少第二个函数的大小?有没有好的方法来避免矢量化?您有什么办法可以避免重复切片吗?

为了完整起见,这是优化函数:

def neg_log_event_likelyhood(model, event, theta):
    x = -np.log(model(event, *theta))
    return x

def fit_distribution_anneal(model, events, bounds, data_range = None, **kwargs):
    def total_log_likelyhood(theta, model, events):
        return np.sum(neg_log_event_likelyhood(model, events, theta))

    if data_range is not None:
        events = np.copy(events)
        events = events[np.logical_and(events > data_range[0], events < data_range[1])]
    
    result = dual_annealing(total_log_likelyhood, bounds, args=(model, events), **kwargs)
    params = result.x

    return params

由于问题的非凸性质,需要进行退火而不是最小化。

python numpy performance gaussian-mixture-model
1个回答
0
投票

正如怀疑的那样,主要问题是

np.vectorize
。通过使用
np.transpose
我可以滥用矩阵乘法来计算正态分布元素并求和数组的适当轴。以下是优化后的代码:

def generate_gaussian_mix(n):

    """
    Dynamically generates a function for the superposition of `n` Gaussian functions.
    
    Args:
        n (int): Number of Gaussian functions to include in the superposition.
    
    Returns:
        function: A callable function `f(x, params)` where `params` is a flat array of weights, means, 
                  and standard deviations for each Gaussian component, of size 3*n.
    """

    def gaussian_mix(x, *params):
        
        if len(params) != 3 * n - 1:
            print(params)
            raise ValueError(f"Expected {3 * n - 1} parameters, but got {len(params)}.")

        params = np.asarray(params)
        mu = params[0::3]  # Means
        sigma = params[1::3]  # Standard deviations
        a = params[2::3]  # Weights

        return  np.sum(normal(np.transpose([x]), mu[:-1], sigma[:-1], a), axis = 1) + normal(np.transpose([x]), mu[-1], sigma[-1], 1-np.sum(a))[:,0]

    return gaussian_mix
© www.soinside.com 2019 - 2024. All rights reserved.