Python 中的加权 Wilcoxon Rank-Sum 测试

Question

我有一些序数数据，我想在两组之间进行 Wilcoxon Rank-Sum 检验（例如男性与女性）。

但是，对于其他类别（例如年龄），人口比例和我得到的样本之间存在一些不平衡，因此我想对每个数据点应用一些权重。

使用 R 中的 Weighted Wilcoxon 测试作为先头，我在 python 中创建了一个代码来执行此操作，假设我有 x 和 y 的权重。

import numpy as np
from scipy.special import erfc

def weighted_ranksum_test(x: np.ndarray, y: np.ndarray, wx: np.ndarray, wy: np.ndarray):
    U = 0
    for iy, weight_y in zip(y, wy):
        smaller = x < iy
        equal = x == iy

        sum_smaller = np.sum(wx[smaller] * weight_y)
        sum_equal = np.sum(wx[equal] * weight_y / 2)
        sum_tot = sum_smaller + sum_equal

        U += sum_tot

    nY = np.sum(wy)
    nX = np.sum(wx)

    mU = nY * nX / 2
    sigU = np.sqrt((nY * nX * (1 + nY + nX)) / 12)
    zU = (U - mU) / sigU

    pU = erfc(zU / np.sqrt(2)) / 2

    return pU

我的问题是，这个实现看起来正确吗？我在假设权重相等的情况下针对 https://docs.scipy.org/doc/scipy/reference/ generated/scipy.stats.ranksums.html 进行了测试，并且它达到了相同的 p 值，但我不是确定这对于我所处的情况来说是否是正确的实现。

Answer 1

我不能从理论角度说，但由于没有其他答案，我可以提供一些证据来证明该测试的有效性，并指出替代方案。

首先，正如您所指出的，在统一权重的特殊情况下与

scipy.stats.ranksums

达成一致。为了完整起见：

import numpy as np
from scipy import stats
from scipy.special import erfc

# Your function verbatim
def weighted_ranksum_test(x: np.ndarray, y: np.ndarray, wx: np.ndarray, wy: np.ndarray):
    U = 0
    for iy, weight_y in zip(y, wy):
        smaller = x < iy
        equal = x == iy

        sum_smaller = np.sum(wx[smaller] * weight_y)
        sum_equal = np.sum(wx[equal] * weight_y / 2)
        sum_tot = sum_smaller + sum_equal

        U += sum_tot

    nY = np.sum(wy)
    nX = np.sum(wx)

    mU = nY * nX / 2
    sigU = np.sqrt((nY * nX * (1 + nY + nX)) / 12)
    zU = (U - mU) / sigU

    pU = erfc(zU / np.sqrt(2)) / 2

    return pU

# Generate data and weights
m, n = 13, 17
rng = np.random.default_rng(6270910759530225579)  # arbitrary seed
x = rng.random(size=m)
y = rng.random(size=n)
wx = np.ones_like(x)
wy = np.ones_like(y)

# Compare p-value with SciPy
# Roughly, the alternative hypothesis is that observations from the distribution
# underlying `x` tend to be less than those from the distribution underlying `y`
_, res = stats.ranksums(x, y, alternative='less')
ref = weighted_ranksum_test(x, y, wx, wy)
np.testing.assert_allclose(res, ref)  # passes

我们期望的另一个属性是整数权重与重复观察具有相同的效果。

wx = rng.integers(1, 10, size=m)
wy = rng.integers(1, 10, size=n)
x2 = np.repeat(x, wx)
y2 = np.repeat(y, wy)
wx2 = np.ones_like(x2)
wy2 = np.ones_like(y2)
res = weighted_ranksum_test(x, y, wx, wy)
ref = weighted_ranksum_test(x2, y2, wx2, wy2)
np.testing.assert_allclose(res, ref)  # passes

也许最重要的是，我们期望检验的 p 值在所有观察值都是从同一总体中随机抽取的原假设下均匀分布。我们可以使用蒙特卡洛方法对此进行评估：在原假设下生成许多随机样本，计算每个集合的 p 值，并将 p 值的经验分布与理论均匀分布进行比较。

import matplotlib.pyplot as plt

wx = rng.random(size=m)
wy = rng.random(size=n)

def assess_size():
  ps = []
  for i in range(10000):
    x = rng.random(size=m)
    y = rng.random(size=n)
    p = weighted_ranksum_test(x, y, wx, wy)
    ps.append(p)

  res = stats.ecdf(ps)  # empirical cumulative distribution object
  ax = plt.subplot()
  res.cdf.plot(ax)
  ax.plot([0, 1], [0, 1])
  ax.set_xlabel('p-value')
  ax.set_ylabel('CDF')
  ax.set_title('Empirical distribution of p-values under H0')
  ax.legend(['ECDF', 'Uniform CDF'])
  
assess_size()

使用均匀分布的权重，检验大致具有预期的“大小”（如果原假设为真则错误地拒绝原假设的概率），但有点保守；例如，不到 5% 的 p 值小于 0.05。

当与一个样本相关的权重往往大于另一样本的权重时，这种差异就会消失。

import matplotlib.pyplot as plt

wx = rng.random(size=m)*2
wy = rng.random(size=n)
assess_size()

因此您可以考虑用您的实际体重进行相同的评估。

由于检验是保守的，因此该近似值的功效（正确拒绝错误的零假设的概率）有点低。如果您想要更准确的 p 值，可以使用蒙特卡罗方法计算它们。为此，我们稍微修改您的函数，使其返回

统计量而不是 p 值，并使用

monte_carlo_test

生成 p 值。

def weighted_ranksum_statistic(x, y, wx, wy):
    U = 0
    for iy, weight_y in zip(y, wy):
        smaller = x < iy
        equal = x == iy

        sum_smaller = np.sum(wx[smaller] * weight_y)
        sum_equal = np.sum(wx[equal] * weight_y / 2)
        sum_tot = sum_smaller + sum_equal

        U += sum_tot
    
    nY = np.sum(wy)
    nX = np.sum(wx)

    mU = nY * nX / 2
    sigU = np.sqrt((nY * nX * (1 + nY + nX)) / 12)
    zU = (U - mU) / sigU
    return zU

# use weights from before
x = rng.random(m) - 0.3  # null is false, alternative is true 
y = rng.random(n)

res = weighted_ranksum_test(x, y, wx, wy)
# `alternative='greater'` here because, roughly, the statistic tends to be
# *greater* when the alternative hypothesis is true
ref = stats.monte_carlo_test((x, y), rvs=(rng.random, rng.random),
                             statistic=lambda x, y: weighted_ranksum_statistic(x, y, wx, wy),
                             alternative='greater')

# p-values are close, and p-values are small, providing evidence against the
# null in favor of the alternative
print(res, ref.pvalue)
# 0.03557664132002421 0.0338

事实上，检验统计量的零分布近似于标准正态分布，近似为

weighted_ranksum_test

。

ax = plt.subplot()
z = np.linspace(-5, 5, 300)
ax.plot(z, stats.norm.pdf(z), label='normal approximation')
ax.hist(ref.null_distribution, bins=50, density=True, 
        label='Monte Carlo null distribution')
ax.legend(loc='lower right')
ax.set_xlabel('test statistic')
ax.set_ylabel('frequency/probability density')
ax.set_title('Monte Carlo null distribution vs normal approximation')

在这种情况下，协议很好，但正如我们在上面看到的，这取决于权重。

wx *= 2
ref = stats.monte_carlo_test((x, y), rvs=(rng.random, rng.random),
                             statistic=lambda x, y: weighted_ranksum_statistic(x, y, wx, wy),
                             alternative='greater')
ax.plot(z, stats.norm.pdf(z), label='normal approximation')
ax.hist(ref.null_distribution, bins=50, density=True, 
        label='Monte Carlo null distribution')
ax.legend(loc='lower right')
ax.set_xlabel('test statistic')
ax.set_ylabel('frequency/probability density')
ax.set_title('Monte Carlo null distribution vs normal approximation')

因此，特别是如果上面

assess_size

生成的图不一致，我会相信

monte_carlo_test

版本。

Python 中的加权 Wilcoxon Rank-Sum 测试

问题描述投票：0回答：1

1个回答

最新问题

Python 中的加权 Wilcoxon Rank-Sum 测试

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1