使用 bootstrapping 获取数据集 bin 的误差条

Question

我有一些名为

variable

的数据，我可以通过以下方式模拟：

import pandas as pd
import numpy as np
from scipy.stats import bootstrap

random_values = np.random.uniform(low=0, high=10, size=10000)
df = pd.DataFrame({"variable": random_values})

我想将我的数据分入 5 个容器中

bins = [0, 2, 4, 6, 8, 10]

并使用一些引导方法计算每个容器的误差条，例如95% 水平的置信区间。我发现麻烦的是计算误差线。我可以用

scipy.stats.bootstrap

来做，然后做

bootstrap(one_of_the_bins, my_statistic, confidence_level=0.95, method='percentile')

但它要求我根据箱将数据分成块并循环遍历块。所以我想知道是否有更方便的方法来做到这一点，pandas 中是否集成了一些功能？或者我可以向

scipy.stats

提供我的完整数据和 bin，然后 scipy 将一起对所有 bin 进行计算？谢谢您的建议！

Answer 1

它可以将所有的bin一起计算：

import numpy as np
from scipy.stats import bootstrap, binned_statistic

x = np.random.uniform(low=0, high=10, size=10000)

def statistic(x):
    res = binned_statistic(x, x, bins=[0, 2, 4, 6, 8, 10])
    return res.statistic

res = bootstrap((x,), statistic)
res.confidence_interval
# ConfidenceInterval(low=array([0.99653325, 2.969743  , 4.99033544, 6.98312963, 8.9515727 ]), high=array([1.04843922, 3.02077679, 5.04092083, 7.03426957, 9.0015762 ]))

使用 bootstrapping 获取数据集 bin 的误差条

问题描述投票：0回答：1

1个回答

最新问题

使用 bootstrapping 获取数据集 bin 的误差条

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1