对于多元高斯,计算给定点的 CDF 或 PDF 是直接的。
rv = scipy.stats.multivariate_normal(mean, cov)
然后 rv.pdf(point)
或 rv.cdf(upper)
但是我有某些轴的值(在这些轴中我想要 PDF),但对其他轴有上限(在这些轴中我需要积分,CDF)。
我可以拆分问题:
是否有一个函数可以在某些轴上获得多元高斯分布?
相关:
如果我理解正确的话,你有一个 $N$ 维度的多元正态分布,并且你会得到 $N - q$ 坐标。您想要以这些坐标为条件的 $q$ 维度的多元正态分布,并且您想要在您只知道其上限的“其他”坐标处评估后者的 CDF。
关于多元正态分布的维基百科文章有一个标题为“条件分布”的部分(https://en.wikipedia.org/wiki/Multivariate_normal_distribution#Conditional_distributions)。下面的代码生成示例 MVN 和坐标,实现相关方程来计算条件概率,并使用数值积分执行健全性检查(假设 $N - q = 2$)。
import numpy as np
from scipy import stats
from scipy.integrate import dblquad
rng = np.random.default_rng(238492432)
n = 6 # dimensionality
qc = 4 # number of given coordinates
q = n - qc # number of other coordinates (must be 2 if you want check to work)
x = rng.random(n) # generate values for all axes
# the first `q` are the "other" coordinates for which you want the CDF
# the rest are "given"
A = rng.random(size=(n, n)) # generate covariance matrix
A = A + A.T + np.eye(n)*n
mu = rng.random(n) # generate mean
dist0 = stats.multivariate_normal(mean=mu, cov=A)
# Generate MVN conditioned on x[q:]
s11 = A[:q, :q] # partition covariance matrix
s12 = A[:q, q:]
s21 = A[q:, :q]
s22 = A[q:, q:]
mu1 = mu[:q] # partition mean
mu2 = mu[q:]
x1 = x[:q] # given values
x2 = x[q:] # other values
a = x2
inv_s22 = np.linalg.inv(s22)
mu_c = mu1 + s12 @ inv_s22 @ (a - mu2)
A_c = s11 - s12 @ inv_s22 @ s21
dist = stats.multivariate_normal(mean=mu_c, cov=A_c)
# Check (assumes q = 2)
def pdf(y, x):
return dist0.pdf(np.concatenate(([x, y], x2)))
p1 = dblquad(pdf, -np.inf, x[0], -np.inf, x[1])[0] # joint probability
p2 = dblquad(pdf, -np.inf, np.inf, -np.inf, np.inf)[0] # marginal probability
# These should match (approximately)
dist.cdf(x1), p1/p2
# (0.25772255281364065, 0.25772256555864476)