Python 曼惠特尼置信区间

Question

我有两个数据集（Pandas 系列）- ds1 和 ds2 - 我想计算平均值（如果正常）或中位数（非正常）差异的 95% 置信区间。

对于平均值的差异，我计算 t 检验统计量和 CI：

import statsmodels.api as sm
tstat, p_value, dof = sm.stats.ttest_ind(ds1, ds2)
CI = sm.stats.CompareMeans.from_data(ds1, ds2).tconfint_diff()

对于中位数，我这样做：

from scipy.stats import mannwhitneyu
U_stat, p_value = mannwhitneyu(ds1, ds2, True, "two-sided")

如何计算中位数差异的 CI？

Answer 1

我发现了一篇论文（计算某些非参数的置信区间） MICHAEL J CAMPBELL, MARTIN J GARDNER) 的分析给出了 CI 公式。

基于此：

from scipy.stats import norm

ct1 = ds1.count()  #items in dataset 1
ct2 = ds2.count()  #items in dataset 2
alpha = 0.05       #95% confidence interval
N = norm.ppf(1 - alpha/2) # percent point function - inverse of cdf

# The confidence interval for the difference between the two population
# medians is derived through these nxm differences.
diffs = sorted([i-j for i in ds1 for j in ds2])

# For an approximate 100(1-a)% confidence interval first calculate K:
k = int(round(ct1*ct2/2 - (N * (ct1*ct2*(ct1+ct2+1)/12)**0.5)))

# The Kth smallest to the Kth largest of the n x m differences 
# ct1 and ct2 should be > ~20
CI = (diffs[k], diffs[len(diffs)-k])

Answer 2

还有这篇论文（置信区间为曼-惠特尼测试 Maja Pohar Perme 和 Damjan Manevski）

这是最新的，并解释了用于计算此测试 CI 的不同方法，他们提供了 r 代码。

Python 曼惠特尼置信区间

问题描述投票：0回答：2

2个回答

最新问题

Python 曼惠特尼置信区间

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2