使用 numpy 计算一组值的概率密度函数

Question

下面是我想要绘制 PDF 的数据。 https://gist.github.com/ecenm/cbbdcea724e199dc60fe4a38b7791eb8#file-64_general-out

以下是脚本

import numpy as np
import matplotlib.pyplot as plt
import pylab

data = np.loadtxt('64_general.out')
H,X1 = np.histogram( data, bins = 10, normed = True, density = True) # Is this the right way to get the PDF ?
plt.xlabel('Latency')
plt.ylabel('PDF')
plt.title('PDF of latency values')

plt.plot(X1[1:], H)
plt.show()

当我绘制上面的图时，我得到以下结果。

以上是计算一系列值的 PDF 的正确方法吗
有没有其他方法可以确认我得到的结果是实际的PDF。例如，对于我的案例，如何显示 pdf = 1 下的区域。

Answer 1

这是近似PDF 的合法方法。由于 np.histogram 使用各种技术对值进行分箱，因此您无法获得输入中每个数字的确切频率。为了获得更精确的近似值，您应该计算每个数字的出现次数并将其除以总数。此外，由于这些是离散值，因此可以将图绘制为点或条形，以给出更正确的印象。
在离散情况下，频率之和应等于 1。在连续情况下，您可以使用
```
np.trapz()
```
来近似积分。

Answer 2

对于离散情况

import numpy as np
import matplotlib.pyplot as plt

x = np.random.normal(size=1000)
x=x*0.7

##If True, draw and return a Probability Density
n, bins, patches = plt.hist(x, bins=10, density=True, edgecolor='black', lw=3, fc=(0, 0, 1, 0.5), alpha=0.2)     # color='maroon',
plt.hist(x, bins=10,  cumulative=True,  lw=3, fc=(0, 0, 0.5, 0.3), log=True)  # fc= RGBA
##print(n, bins, patches.datavalues)
density = n / (sum(n) * np.diff(bins))
##print(density)
####the area under the histogram integrates to 1 (np.sum(density * np.diff(bins)) == 1).
print(np.sum(density * np.diff(bins)))
print(np.allclose(np.sum(density * np.diff(bins)) , 1))

使用 numpy 计算一组值的概率密度函数

问题描述投票：0回答：2

2个回答

最新问题

使用 numpy 计算一组值的概率密度函数

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2