我有以下数据,这是班级的分布。
X shape == (477324, 5, 11)
Y shape == (477324,)
{0: 11986, 1: 465338}
由于我的数据集不平衡,我使用下面的代码尝试了 RandomOverSampling。
from imblearn.over_sampling import RandomOverSampler
oversample = RandomOverSampler(sampling_strategy='minority')
oversample.fit_resample(trainX[:,:,0], trainY)
Xo = trainX[oversample.sample_indices_]
yo = trainY[oversample.sample_indices_]
Xo shape == (930676, 5, 11).
yo shape == (930676,).
{0: 465338, 1: 465338}
但是,我如何使用 SMOTE 而不是 RandomOverSampler? 我尝试使用下面的代码来应用 SMOTE 并将其重新整形为 3d 数组,因为在重新采样后我也需要一个 3d 数组。
Xo_smote,yo_smote = oversample_Smote.fit_resample(trainX[:,:,0], trainY)
Xo shape == (930676, 5).
yo shape == (930676,).
org_shape= trainX.shape
Xo = np.reshape(Xo, org_shape)
我收到错误
"ValueError: cannot reshape array of size 51187180 into shape (477324,5,11)"
请提出任何建议。
我认为您正在尝试将过采样数组重塑为原始形状,由于过采样,该数组现在比原始数组大。
这里有一个关于如何处理给定形状的最小工作示例:
from imblearn.over_sampling import SMOTE
import numpy as np
train_features = np.random.rand(477324, 5, 11)
train_labels = np.array([0] * 11986 + [1] * 465338)
np.random.shuffle(train_labels)
print(train_features.shape, train_labels.shape) # (477324, 5, 11) (477324,)
train_features_shape = train_features.shape
train_features = train_features.reshape(train_features.shape[0], train_features.shape[1]*train_features.shape[2])
print(train_features.shape, train_labels.shape) # (477324, 55) (477324,)
sm = SMOTE(random_state=69)
train_features, train_labels = sm.fit_resample(train_features, train_labels)
print(train_features.shape, train_labels.shape) # (930676, 55) (930676,)
train_features = train_features.reshape(train_features.shape[0], train_features_shape[1], train_features_shape[2])
print(train_features.shape, train_labels.shape) # (930676, 5, 11) (930676,)