使用SMOTE处理不平衡的3D阵列数据

问题描述 投票:0回答:1

我有以下数据,这是班级的分布。

X shape == (477324, 5, 11)
Y shape == (477324,)
{0: 11986, 1: 465338}

由于我的数据集不平衡,我使用下面的代码尝试了 RandomOverSampling。

from imblearn.over_sampling import RandomOverSampler
oversample = RandomOverSampler(sampling_strategy='minority')
oversample.fit_resample(trainX[:,:,0], trainY)
Xo = trainX[oversample.sample_indices_]
yo = trainY[oversample.sample_indices_]

Xo shape == (930676, 5, 11).
yo shape == (930676,).
{0: 465338, 1: 465338}

但是,我如何使用 SMOTE 而不是 RandomOverSampler? 我尝试使用下面的代码来应用 SMOTE 并将其重新整形为 3d 数组,因为在重新采样后我也需要一个 3d 数组。

Xo_smote,yo_smote = oversample_Smote.fit_resample(trainX[:,:,0], trainY)

Xo shape == (930676, 5).
yo shape == (930676,).

org_shape= trainX.shape
Xo = np.reshape(Xo, org_shape)

我收到错误

"ValueError: cannot reshape array of size 51187180 into shape (477324,5,11)"
请提出任何建议。

python multidimensional-array imbalanced-data smote
1个回答
0
投票

我认为您正在尝试将过采样数组重塑为原始形状,由于过采样,该数组现在比原始数组大。

这里有一个关于如何处理给定形状的最小工作示例:

from imblearn.over_sampling import SMOTE
import numpy as np

train_features = np.random.rand(477324, 5, 11)
train_labels = np.array([0] * 11986 + [1] * 465338)
np.random.shuffle(train_labels)
print(train_features.shape, train_labels.shape)  # (477324, 5, 11) (477324,)

train_features_shape = train_features.shape
train_features = train_features.reshape(train_features.shape[0], train_features.shape[1]*train_features.shape[2])
print(train_features.shape, train_labels.shape)  # (477324, 55) (477324,)

sm = SMOTE(random_state=69)
train_features, train_labels = sm.fit_resample(train_features, train_labels)
print(train_features.shape, train_labels.shape)  # (930676, 55) (930676,)

train_features = train_features.reshape(train_features.shape[0], train_features_shape[1], train_features_shape[2])
print(train_features.shape, train_labels.shape)  # (930676, 5, 11) (930676,)
© www.soinside.com 2019 - 2024. All rights reserved.