为什么我的假语音检测模型实现了完美的训练、验证和测试准确性，但精确度、召回率和 F1 分数却很差？

Question

我正在研究一个假语音分类问题，并使用 3000 张图像的数据集训练了多个架构。尽管尝试对我的模型进行多次更改，但我遇到了一个持续存在的问题，即对于我尝试过的每种架构，我的训练、测试和验证准确度始终很高，始终高于 97%。然而，Precision、Recall 和 F1 分数一直很差。我也尝试了不同的超参数，例如调整学习率、批量大小和轮数，但 Precision、Recall 和 F1 分数仍然很差。

谁能帮我理解为什么我的准确率很高，但精度、召回率和 F1 分数却很差？我能做些什么来改善它们？

型号代码：

import os
import random
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.optimizers import Adam
from sklearn.metrics import f1_score, precision_score

# Set seed values for reproducibility
seed_value = 42
tf.random.set_seed(seed_value)
np.random.seed(seed_value)
random.seed(seed_value)

# Define paths to dataset
data_dir = "/content/drive/MyDrive/Finalspectogramdataaftersplit"
train_dir = os.path.join(data_dir, "Train")
test_dir = os.path.join(data_dir, "Test")
val_dir = os.path.join(data_dir, "Val")

# Define image size and batch size
img_size = (128, 128)
batch_size = 32 #32

# Create data generators with augmentation for training and validation sets
train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True)
train_generator = train_datagen.flow_from_directory(os.path.join(train_dir), target_size=img_size, batch_size=batch_size, class_mode='binary')

val_datagen = ImageDataGenerator(rescale=1./255)
val_generator = val_datagen.flow_from_directory(os.path.join(val_dir), target_size=img_size, batch_size=batch_size, class_mode='binary')

test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(os.path.join(test_dir), target_size=img_size, batch_size=batch_size, class_mode='binary')

# Define model architecture
input_shape = (128, 128, 3)

# Convolutional Image Encoding Block
input_layer = keras.Input(shape=input_shape)
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(input_layer)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)

# Transformer Encoder
num_heads = 2
ff_dim = 64

x = layers.MultiHeadAttention(num_heads=num_heads, key_dim=64)(x, x)
x = layers.Dropout(0.1)(x)  
x = layers.LayerNormalization(epsilon=1e-6)(x)

x = layers.Dense(ff_dim, activation='relu')(x)
x = layers.Dropout(0.1)(x)  
x = layers.Dense(ff_dim, activation='relu')(x)
x = layers.Reshape((16, 16, 256))(x) # modified output shape

# Sequence Pooling Layer
x = layers.GlobalAveragePooling2D()(x)

# MLP Head
x = layers.Dense(64, activation='relu')(x)
x = layers.Dropout(0.5)(x)#0.5
output_layer = layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs=input_layer, outputs=output_layer)
optimizer = Adam(learning_rate=0.0001)
# Compile model
model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])

# Set up TensorBoard callback
logdir = os.path.join("/content/drive/MyDrive/logs")
tensorboard_callback = TensorBoard(
    log_dir=logdir,
    update_freq='epoch',
    histogram_freq=1,
    profile_batch=0,
    write_graph=True,
    write_images=True
)

# Train model with TensorBoard callback
epochs = 40
history = model.fit(train_generator, epochs=epochs, validation_data=val_generator, callbacks=[tensorboard_callback])

# Evaluate model on test set
test_loss, test_acc = model.evaluate(test_generator)
print(f'Test loss: {test_loss}, Test accuracy: {test_acc}')

#Make predictions on the test data
y_pred = model.predict(test_generator)
y_pred = (y_pred > 0.5).astype(int) # Convert probabilities to binary predictions

# Calculate F1 score and precision
f1 = f1_score(test_generator.labels, y_pred)
precision = precision_score(test_generator.labels, y_pred)

print(f'F1 score: {f1:.2f}')
print(f'Precision: {precision:.2f}')
# Save model
model.save('/content/drive/MyDrive/originalmodel1.h5')

第一次尝试：

我在上面训练了相同的模型

结果：

10/10 [==============================] - 2s 220ms/step - loss: 0.2481 - accuracy: 0.9164
          Test loss: 0.24810609221458435, Test accuracy: 0.9163879752159119
10/10 [==============================] - 2s 211ms/step
Accuracy: 0.5017
Precision: 0.5017
Recall: 1.0000
F-1 Score: 0.6682

第二次尝试：

我减少了一些层，例如删除编码器块。因为我认为它会导致过度拟合。

结果同上

评估代码

mport numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, classification_report, roc_auc_score, roc_curve, precision_recall_curve, precision_score, recall_score

# Load the saved model
model = keras.models.load_model('/content/drive/MyDrive/originalmodel1.h5')

# Set threshold value for converting probabilities to binary predictions
threshold = 4.4391604205884505e-06

# Make predictions on the test set
y_pred_proba = model.predict(test_generator)
y_pred = (y_pred_proba > threshold).astype(int)

# Get the true labels from the generator
y_true = test_generator.classes

# Compute metrics
accuracy = (y_pred == y_true).mean()
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1_score = 2 * precision * recall / (precision + recall)
tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
balanced_accuracy = (tp / (tp + fn) + tn / (tn + fp)) / 2
roc_auc = roc_auc_score(y_true, y_pred_proba)
precision_recall_auc = roc_auc_score(y_true, y_pred_proba, average='weighted', multi_class='ovo')
fpr, tpr, thresholds = roc_curve(y_true, y_pred_proba)
precision_values, recall_values, thresholds = precision_recall_curve(y_true, y_pred_proba)

# Print the metrics
print(f'tp: {tp:.4f}')
print(f'tn: {tn:.4f}')
print(f'fp: {fp:.4f}')
print(f'fn: {fn:.4f}')
print(f'Accuracy: {accuracy:.4f}')
print(f'Precision: {precision:.4f}')
print(f'Recall: {recall:.4f}')
print(f'F-1 Score: {f1_score:.4f}')
print(f'Balanced Accuracy: {balanced_accuracy:.4f}')
print(f'ROC AUC: {roc_auc:.4f}')
print(f'PR AUC: {precision_recall_auc:.4f}')

我换了很多次门槛都没用。我认为我的代码有问题。

为什么我的假语音检测模型实现了完美的训练、验证和测试准确性，但精确度、召回率和 F1 分数却很差？

问题描述投票：0回答：0

最新问题

为什么我的假语音检测模型实现了完美的训练、验证和测试准确性，但精确度、召回率和 F1 分数却很差？

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0