用于多标签文本分类的类不平衡的Imblearn和Sklearn算法

Question

我正在研究多标签文本分类问题（目标标签总数为90）。数据分布的尾巴很长，大约有1900k条记录。目前，我正在研究目标分布相似的大约10万条记录的小样本。

[某些算法提供了处理类不平衡的功能，例如PAC，LinearSVC。目前，我还在进行SMOTE生成除多数人以外的所有样本的样本，并执行RandomUnderSampler来抑制多数人类别的不平衡。

同时使用算法参数和imblearn管道来处理类不平衡是否正确？

feat_pipeline = FeatureUnion([('text', text_pipeline)])

estimators_list = [
                   ('PAC',PassiveAggressiveClassifier(max_iter=5000,random_state=0,class_weight='balanced')),
                   ('linearSVC', LinearSVC(class_weight='balanced'))
                  ]
estimators_ensemble = StackingClassifier(estimators=estimators_list, 
                                         final_estimator=LogisticRegression(solver='lbfgs',max_iter=5000))
ovr_ensemble = OneVsRestClassifier(estimators_ensemble)

classifier_pipeline = imblearnPipeline([
        ('features', feat_pipeline),
        ('over_sampling', SMOTE(sampling_strategy='auto')), # resample all classes but the majority class;
        ('under_sampling',RandomUnderSampler(sampling_strategy='auto')), # resample all classes but the minority class;
        ('ovr_ensemble', ovr_ensemble)
    ])

Answer 1

同时使用算法参数和imblearn管道来处理类不平衡是否正确？

用于多标签文本分类的类不平衡的Imblearn和Sklearn算法

问题描述投票：0回答：1

1个回答

最新问题

用于多标签文本分类的类不平衡的Imblearn和Sklearn算法

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1