NumPy 百分位数函数与 MATLAB 的百分位数函数不同

Question

当我尝试在 MATLAB 中计算第 75 个百分位数时，得到的值与在 NumPy 中得到的值不同。

MATLAB：

>> x = [ 11.308 ;   7.2896;   7.548 ;  11.325 ;   5.7822;   9.6343;
     7.7117;   7.3341;  10.398 ;   6.9675;  10.607 ;  13.125 ;
     7.819 ;   8.649 ;   8.3106;  12.129 ;  12.406 ;  10.935 ;
    12.544 ;   8.177 ]

>> prctile(x, 75)

ans =

11.3165

Python + NumPy：

>>> import numpy as np

>>> x = np.array([ 11.308 ,   7.2896,   7.548 ,  11.325 ,   5.7822,   9.6343,
     7.7117,   7.3341,  10.398 ,   6.9675,  10.607 ,  13.125 ,
     7.819 ,   8.649 ,   8.3106,  12.129 ,  12.406 ,  10.935 ,
    12.544 ,   8.177 ])

>>> np.percentile(x, 75)
11.312249999999999

我也用 R 检查了答案，我得到了 NumPy 的答案。

R：

> x <- c(11.308 ,   7.2896,   7.548 ,  11.325 ,   5.7822,   9.6343,
+          7.7117,   7.3341,  10.398 ,   6.9675,  10.607 ,  13.125 ,
+          7.819 ,   8.649 ,   8.3106,  12.129 ,  12.406 ,  10.935 ,
+         12.544 ,   8.177)
> quantile(x, 0.75)
     75% 
11.31225

这是怎么回事？有没有办法让 Python 和 R 的行为镜像 MATLAB 的行为？

Answer 1

MATLAB 显然默认使用中点插值。 NumPy 和 R 默认使用线性插值：

In [182]: np.percentile(x, 75, interpolation='linear')
Out[182]: 11.312249999999999

In [183]: np.percentile(x, 75, interpolation='midpoint')
Out[183]: 11.3165

要了解

linear

和

midpoint

之间的区别，请考虑这个简单的示例：

In [187]: np.percentile([0, 100], 75, interpolation='linear')
Out[187]: 75.0

In [188]: np.percentile([0, 100], 75, interpolation='midpoint')
Out[188]: 50.0

编译最新版本的 NumPy（使用 Ubuntu）：

mkdir $HOME/src
git clone https://github.com/numpy/numpy.git
git remote add upstream https://github.com/numpy/numpy.git
# Read ~/src/numpy/INSTALL.txt
sudo apt-get install libatlas-base-dev libatlas3gf-base
python setup.py build --fcompiler=gnu95
python setup.py install

使用

git

而不是

pip

的优点是升级（或降级）到其他版本的 NumPy 非常容易（并且您也可以获得源代码）：

git fetch upstream
git checkout master # or checkout any other version of NumPy
cd ~/src/numpy
/bin/rm -rf build
cdsitepackages    # assuming you are using virtualenv; otherwise cd to your local python sitepackages directory
/bin/rm -rf numpy numpy-*-py2.7.egg-info
cd ~/src/numpy
python setup.py build --fcompiler=gnu95
python setup.py install

Answer 2

由于即使在 @cpaulik 发表评论后，接受的答案仍然不完整，我在这里发布希望是更完整的答案（尽管出于简洁原因，并不完美，请参见下文）。

使用 np.percentile(x, p, interpolation='midpoint') 只会对非常具体的值给出相同的答案，即当 p/100 是 1/n 的倍数时，n 是大批。在最初的问题中，情况确实如此，因为 n=20 和 p=75，但一般来说这两个函数是不同的。

Matlab prctile 函数的简短仿真如下：

def quantile(x,q):
    n = len(x)
    y = np.sort(x)
    return(np.interp(q, np.linspace(1/(2*n), (2*n-1)/(2*n), n), y))

def prctile(x,p):
    return(quantile(x,np.array(p)/100))

该函数与 Matlab 的函数一样，给出从 min(x) 到 max(x) 的分段线性输出。 Numpy 的百分位数函数，插值=中点，返回两个最小元素的平均值和两个最大元素的平均值之间的分段 constant 函数。在原始问题中绘制数组的两个函数给出了此链接中的图片（抱歉无法嵌入它）。红色虚线标记了 75% 百分位，这两个函数实际上是重合的。

附注这个函数实际上并不等同于 Matlab 函数的原因是它只接受一维 x，对于更高维的东西会产生错误。另一方面，Matlab 接受更高的 dim x 并在第一个（非平凡）维度上运行，但正确实现它可能需要更长的时间。然而，这个函数和 Matlab 的函数都应该正确地处理 p / q 的更高维输入（感谢使用 np.interp 来处理它）。

Answer 3

从 numpy 1.22 及更高版本开始，numpy 通过 method='hazen' kwarg 支持与 MATLAB 相同的实现。

In [71]: import numpy as np
In [72]: x = np.array([11.308, 7.2896, 7.548, 11.325, 5.7822, 9.6343, 7.7117, 7.3341, 10.398, 6.9675, 10.607, 13.125, 7.819, 8.649, 
8.3106, 12.129, 12.406, 10.935, 12.544, 8.177])
In [73]: np.percentile(x, 75, method='hazen')
Out[73]: 11.3165

NumPy 百分位数函数与 MATLAB 的百分位数函数不同

问题描述投票：0回答：3

3个回答

最新问题

NumPy 百分位数函数与 MATLAB 的百分位数函数不同

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3