Numpy 函数获取与给定值对应的分位数

Question

我看到很多像 R 这样的问题，但我找不到专门针对 Python 的问题，最好使用 numpy。

假设我在

中存储了一组观察值。我可以获得人口中

q * 100

累计的价值。

# Import numpy
import numpy as np

# Get 75th percentile
np.quantile(a=x, q=0.75)

但是，我想知道是否有一个函数可以实现相反的功能。也就是说，一个 numpy 函数接受一个值作为输入并返回

。

为了进一步扩展这一点，scipy 分发对象有一个

ppf

方法，允许我执行此操作。我正在寻找 numpy 中类似的东西。存在吗？

Answer 1

不是一个现成的函数，而是一个紧凑且相当快速的代码片段：

(a<value).mean()

您可以（至少在我的机器上）通过使用

np.count_nonzero

获得更好的性能

np.count_nonzero(a<value) / a.size

但是说实话我什至都不会打扰。

Answer 2

有一个方便的功能可以做到这一点。请注意，这不是精确的逆函数，因为

quantile

/

percentile

函数并不精确。给定有限的观察数组，百分位数将具有离散值；换句话说，您可以指定一个介于这些值之间的

，并且函数会找到最接近的值。

from scipy import stats
import numpy as np

stats.percentileofscore(np.arange(0,1,0.12), .65, 'weak') / 100

Answer 3

如果

已排序，则索引

处的值是

i / len(x)

百分位数（或类似，取决于您要如何处理边界条件）。如果

未排序，您可以通过用

x.argsort().argsort()[i]

替换

来获得相同的值（或者先对

进行排序）。由于

argsort

是它自己的逆矩阵，因此 double argsort 会告诉您原始数组的每个元素将落在排序数组中的位置。

如果您想查找不一定在

中的任意值的结果，您可以将

np.searchsorted

应用于

的排序版本并在结果上进行插值。您可以使用更复杂的方法，例如将样条线拟合到排序的数据或类似的方法。

Answer 4

虽然

vals = x.argsort().argsort()/(x.size-1)

在具有完全唯一值的数组中工作，但如果有重复的值，它会失败。相同的值应具有相同的分位数值，但例如，如果数组

有 200 个零值和 800 个大于零的值，则此方法将为这些零值提供 200 个不同的分位数值。使用更安全

vals = np.array([np.count_nonzero(x<x_i)/(x.size-1) for x_i in x])

, 因为相同的值会得到相同的分位数位置。

import numpy as np

def get_quant(x):
  " for each value in x, return which quantile it corresponds to "
  return np.array([np.count_nonzero(x<x_i)/(len(x)-1) for x_i in x])

注意：

(x.size-1)

分母确保分位数值范围在 0 到 1（含）之间。省略

-1

意味着永远不会达到 100% 分位数。

Answer 5

我写了一个简单的辅助方法，定义如下：

def nearest_percentile(input_list, input_value, increment=1):
    """
    return percentile of input_value wrt input_list

    params: input_list  [list]
            input_value [number]
            increment   [number]

    return: integer [1, 100]

    >>> nearest_percentile([1,2,3,4,5], 3)
    >>> 50

    >>> nearest_percentile([1,2,3,4,5], -10)
    >>> 1

    >>> nearest_percentile([1,2,3,4,5], 10)
    >>> 100
    """
    arr1 = np.asarray(input_list)
    arr2 = np.arange(1, 100+increment, increment)
    arr2 = list(map(lambda x: max(1, min(100, x)), arr2))
    sts = np.percentile(arr1, arr2)
    idx = nearest_index(sts, input_value)

    return arr2[idx]

这里有一些示例调用（您甚至可以调整小数）

>>> nearest_percentile(np.random.normal(loc=0, scale=1.0, size=1000), 0.5)
71 # the value 0.5 is the 71% percentile of given input list

>>> nearest_percentile(np.random.normal(loc=0, scale=1.0, size=1000), 0.5, increment=0.1)
71.3 # oh, we can actually see that it is 71.3% and not exactly 71%

Numpy 函数获取与给定值对应的分位数

问题描述投票：0回答：5

5个回答

最新问题

Numpy 函数获取与给定值对应的分位数

问题描述 投票：0回答：5

5个回答

最新问题

问题描述投票：0回答：5