我想提高训练模型的准确性。如果可能的话,我尝试创建一个 ML 模型,根据基因表达谱来预测测试样本是否属于患有疾病或未患有疾病的人。
我在网上查找了一些资源并尝试编写一些代码,但我的准确率停留在约 69% 左右(输出如下)
Epoch 1/100
45049/45049 [==============================] - 106s 2ms/step - loss: 0.6041 - accuracy: 0.6888 - val_loss: 0.6004 - val_accuracy: 0.6928
Epoch 2/100
45049/45049 [==============================] - 106s 2ms/step - loss: 0.6016 - accuracy: 0.6905 - val_loss: 0.5996 - val_accuracy: 0.6881
Epoch 3/100
45049/45049 [==============================] - 108s 2ms/step - loss: 0.6013 - accuracy: 0.6912 - val_loss: 0.5994 - val_accuracy: 0.6934
Epoch 4/100
45049/45049 [==============================] - 105s 2ms/step - loss: 0.6013 - accuracy: 0.6913 - val_loss: 0.5996 - val_accuracy: 0.6881
Epoch 5/100
45049/45049 [==============================] - 109s 2ms/step - loss: 0.6010 - accuracy: 0.6919 - val_loss: 0.5999 - val_accuracy: 0.6949
Epoch 6/100
45049/45049 [==============================] - 111s 2ms/step - loss: 0.6009 - accuracy: 0.6917 - val_loss: 0.5998 - val_accuracy: 0.6937
Epoch 7/100
45049/45049 [==============================] - 133s 3ms/step - loss: 0.6019 - accuracy: 0.6913 - val_loss: 0.6000 - val_accuracy: 0.6894
Epoch 8/100
45049/45049 [==============================] - 132s 3ms/step - loss: 0.6014 - accuracy: 0.6918 - val_loss: 0.5987 - val_accuracy: 0.6959
Epoch 9/100
45049/45049 [==============================] - 121s 3ms/step - loss: 0.6007 - accuracy: 0.6925 - val_loss: 0.5994 - val_accuracy: 0.6946
Epoch 10/100
45049/45049 [==============================] - 126s 3ms/step - loss: 0.6007 - accuracy: 0.6929 - val_loss: 0.6000 - val_accuracy: 0.6941
Epoch 11/100
45049/45049 [==============================] - 137s 3ms/step - loss: 0.6019 - accuracy: 0.6918 - val_loss: 0.5999 - val_accuracy: 0.6883
Epoch 12/100
45049/45049 [==============================] - 136s 3ms/step - loss: 0.6009 - accuracy: 0.6925 - val_loss: 0.5985 - val_accuracy: 0.6957
Epoch 13/100
45049/45049 [==============================] - 137s 3ms/step - loss: 0.6013 - accuracy: 0.6922 - val_loss: 0.5987 - val_accuracy: 0.6958
Epoch 14/100
45049/45049 [==============================] - 138s 3ms/step - loss: 0.6006 - accuracy: 0.6931 - val_loss: 0.5996 - val_accuracy: 0.6939
Epoch 15/100
45049/45049 [==============================] - 137s 3ms/step - loss: 0.6006 - accuracy: 0.6928 - val_loss: 0.6001 - val_accuracy: 0.6868
Epoch 16/100
45049/45049 [==============================] - 136s 3ms/step - loss: 0.6007 - accuracy: 0.6927 - val_loss: 0.5990 - val_accuracy: 0.6956
Epoch 17/100
45049/45049 [==============================] - 138s 3ms/step - loss: 0.6008 - accuracy: 0.6926 - val_loss: 0.6003 - val_accuracy: 0.6921
Epoch 18/100
45049/45049 [==============================] - 138s 3ms/step - loss: 0.6011 - accuracy: 0.6918 - val_loss: 0.5992 - val_accuracy: 0.6892
Epoch 19/100
45049/45049 [==============================] - 138s 3ms/step - loss: 0.6010 - accuracy: 0.6924 - val_loss: 0.6000 - val_accuracy: 0.6886
Epoch 20/100
45049/45049 [==============================] - 137s 3ms/step - loss: 0.6007 - accuracy: 0.6925 - val_loss: 0.6001 - val_accuracy: 0.6885
Epoch 21/100
45049/45049 [==============================] - 141s 3ms/step - loss: 0.6012 - accuracy: 0.6912 - val_loss: 0.5990 - val_accuracy: 0.6896
Epoch 22/100
45049/45049 [==============================] - 138s 3ms/step - loss: 0.6010 - accuracy: 0.6917 - val_loss: 0.5994 - val_accuracy: 0.6889
12514/12514 [==============================] - 21s 2ms/step - loss: 0.5988 - accuracy: 0.6957
ANN Test accuracy: 0.6957491040229797
至于我写的内容,我将在下面附上代码,以使事情更清楚。
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from tensorflow import keras
from tensorflow.keras import layers
# Load data from the dataset file
dataset_file = 'concatenated_dataset.csv'
df = pd.read_csv(dataset_file)
# Check if there are any missing values in the 'VALUE' column
if df['VALUE'].isnull().any():
# Handling missing values with SimpleImputer
imputer = SimpleImputer(strategy='mean')
df['VALUE'] = imputer.fit_transform(df['VALUE'].values.reshape(-1, 1))
# Split the data into features (X) and target variable (y)
X = df['VALUE'].values.reshape(-1, 1)
y = df['Target'].values
# Step 2: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 3: Feature Scaling (optional, but recommended for neural networks)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Step 4: Build the ANN model
input_dim = X_train_scaled.shape[1]
model = keras.Sequential([
layers.Dense(units=256, activation='relu', input_shape=(input_dim,)),
layers.Dropout(0.3),
layers.Dense(units=128, activation='relu'),
layers.Dropout(0.2),
layers.Dense(units=64, activation='relu'),
layers.Dropout(0.1),
layers.Dense(units=1, activation='sigmoid') # For binary classification
])
# Step 5: Compile the model
optimizer = keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
# Step 6: Train the ANN model with early stopping
early_stopping = keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)
history = model.fit(X_train_scaled, y_train, epochs=100, batch_size=32,
validation_split=0.1, callbacks=[early_stopping])
# Step 7: Evaluate the ANN model on the test set
ann_loss, ann_accuracy = model.evaluate(X_test_scaled, y_test)
print("ANN Test accuracy:", ann_accuracy)
如何将 ANN 测试准确率从 69% 提高到 90% 左右?
Hyperparameters
或 RandomizedSearchCV
等调整您的 GridSearchCV
。validation split
方法中使用 fit
,您已经缩放了训练数据,并且验证分割从缩放后的训练数据中获取数据,验证集将包含训练数据的信息。fit_transform
并在其他集上使用 transform
是有原因的。transform
基于x_train。使用 validation_data