StratifiedShuffleSplit错误的输出？

Question

我正在尝试基于分类变量“ Accident_Severity”（严重，严重或轻微）运行StratifiedShuffleSplit。

初始分配：

轻微182994
Serious 40442
致命2973
名称：Accident_Severity，dtype：int64

当我运行此代码时：

from sklearn.model_selection import StratifiedShuffleSplit

stratified_splitter = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=7)    
train_index, test_index = list(stratified_splitter.split(df_full, df_full["Accident_Severity"]))[0] 
df_train = df_full.loc[train_index]    
df_test = df_full.loc    
[test_index]    
print(f"{df_train.shape[0]} train and {df_test.shape[0]} test instances")

214865火车和115个测试实例

结果不应该成比例。并且测试集不是226409条目的20％

最终比例测试：

致命115
名称：Accident_Severity，dtype：int64

最终比例训练：

轻微177630
Serious 34856
致命2379
名称：Accident_Severity，dtype：int64

我的其他同事运行相同的代码，并获得良好的输出。

Answer 1

我卸载并安装Anaconda并在Jupyter笔记本中再次运行所有代码。显然我得到了正确的答案，问题已经解决了...

似乎sklearn出了点问题，我听不懂。

StratifiedShuffleSplit错误的输出？

问题描述投票：0回答：1

1个回答

最新问题

StratifiedShuffleSplit错误的输出？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1