我试图通过使用MFCC的声音文件从.wav文件中提取功能。当我尝试将我的MFCC列表转换为numpy数组时,我收到错误。我很确定发生此错误是因为列表包含具有不同形状的MFCC值(但我不确定如何解决该问题)。
我已经查看了其他2个stackoverflow帖子,但是这些并不能解决我的问题,因为它们对于某个任务来说太具体了。
ValueError: could not broadcast input array from shape (128,128,3) into shape (128,128)
Value Error: could not broadcast input array from shape (857,3) into shape (857)
完整错误消息:
回溯(最近一次调用最后一次):文件“/.... /.../...../Batch_MFCC_Data.py”,第68行,在X = np.array(MFCCs)中ValueError:无法广播输入数组从形状(20,590)到形状(20)
代码示例:
all_wav_paths = glob.glob('directory_of_wav_files/**/*.wav', recursive=True)
np.random.shuffle(all_wav_paths)
MFCCs = [] #array to hold all MFCC's
labels = [] #array to hold all labels
for i, wav_path in enumerate(all_wav_paths):
individual_MFCC = MFCC_from_wav(wav_path)
#MFCC_from_wav() -> returns the MFCC coefficients
label = get_class(wav_path)
#get_class() -> returns the label of the wav file either 0 or 1
#add features and label to the array
MFCCs.append(individual_MFCC)
labels.append(label)
#Must convert the training data to a Numpy Array for
#train_test_split and saving to local drive
X = np.array(MFCCs) #THIS LINE CRASHES WITH ABOVE ERROR
# binary encode labels
onehot_encoder = OneHotEncoder(sparse=False)
Y = onehot_encoder.fit_transform(labels)
#create train/test data
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(MFCCs, Y, test_size=0.25, random_state=0)
#saving data to local drive
np.save("LABEL_SAVE_PATH", Y)
np.save("TRAINING_DATA_SAVE_PATH", X)
以下是MFCC阵列中MFCC(来自.wav文件)形状的快照
MFCCs数组包含以下形状:
...More above...
(20, 423) #shape of returned MFCC from one of the .wav files
(20, 457)
(20, 1757)
(20, 345)
(20, 835)
(20, 345)
(20, 687)
(20, 774)
(20, 597)
(20, 719)
(20, 1195)
(20, 433)
(20, 728)
(20, 939)
(20, 345)
(20, 1112)
(20, 345)
(20, 591)
(20, 936)
(20, 1161)
....More below....
正如您所看到的,MFCC阵列中的MFCC并非都具有相同的形状,这是因为录制的时间长度并不完全相同。这是我无法将数组转换为numpy数组的原因吗?如果这是问题,如何修复此问题以使整个MFCC阵列具有相同的形状?
任何代码片段,以实现这一点和建议将不胜感激!
谢谢!
使用以下逻辑将数组下采样到min_shape
,即将较大的数组减少到min_shape
min_shape = (20, 345)
MFCCs = [arr1, arr2, arr3, ...]
for idx, arr in enumerate(MFCCs):
MFCCs[idx] = arr[:, :min_shape[1]]
batch_arr = np.array(MFCCs)
然后,您可以将这些数组堆叠在批处理数组中,如下面的最小示例所示:
In [33]: a1 = np.random.randn(2, 3)
In [34]: a2 = np.random.randn(2, 5)
In [35]: a3 = np.random.randn(2, 10)
In [36]: MFCCs = [a1, a2, a3]
In [37]: min_shape = (2, 2)
In [38]: for idx, arr in enumerate(MFCCs):
...: MFCCs[idx] = arr[:, :min_shape[1]]
...:
In [42]: batch_arr = np.array(MFCCs)
In [43]: batch_arr.shape
Out[43]: (3, 2, 2)
现在对于第二种策略,将数组较小的数组上采样到max_shape
,遵循类似的逻辑,但是根据您的喜好用零或nan
值填充缺失值。
然后,您可以将数组堆叠为形状(num_arrays, dim1, dim2)
的批处理数组;所以,对于你的情况,形状应该是(num_wav_files, 20, max_column
)