机器学习后指标结果相同的问题

Question

在数据集上尝试机器学习时，我在不同的机器学习算法上得到了相同的指标结果，例如准确性和 F 分数。

我有一个数据集，用于训练我选择的算法。我在 Kaggle 网站上找到了它：source。

以下是 Jupiter 文件中的代码片段及其执行结果：

连接的图书馆列表

中：

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from nltk.corpus import stopwords
from sklearn.metrics import accuracy_score, f1_score
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report
import joblib
import tensorflow as tf
import numpy as np
from tensorflow.keras import models, layers
import warnings

warnings.filterwarnings('ignore')

加载数据集

中：

df = pd.read_csv("payload_mini.csv",encoding='utf-16')
df.head(10)

加载、处理和分割数据以进一步训练分类模型

中：

df = pd.read_csv("payload_mini.csv",encoding='utf-16')

df = df[(df['attack_type'] == 'sqli') | (df['attack_type'] == 'norm')]

X = df['payload']
y = df['label']

vectorizer = CountVectorizer(min_df = 2, max_df = 0.8, stop_words = stopwords.words('english'))
X = vectorizer.fit_transform(X.values.astype('U')).toarray()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

输出：

(8040, 1585)
(8040,)
(2011, 1585)
(2011,)

朴素贝叶斯分类器

中：

nb_clf = GaussianNB()
nb_clf.fit(X_train, y_train)
y_pred = nb_clf.predict(X_test)
print(f"Accuracy of Naive Bayes on test set : {accuracy_score(y_pred, y_test)}")
print(f"F1 Score of Naive Bayes on test set : {f1_score(y_pred, y_test, pos_label='anom')}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

输出：

Accuracy of Naive Bayes on test set : 0.9806066633515664
F1 Score of Naive Bayes on test set : 0.9735234215885948

Classification Report:
              precision    recall  f1-score   support

        anom       0.97      0.98      0.97       732
        norm       0.99      0.98      0.98      1279

    accuracy                           0.98      2011
   macro avg       0.98      0.98      0.98      2011
weighted avg       0.98      0.98      0.98      2011

随机森林算法：

中：

rf_clf = RandomForestClassifier()
rf_clf.fit(X_train, y_train)
y_pred_rf = rf_clf.predict(X_test)
print(f"Accuracy of Random Forest on test set : {accuracy_score(y_pred, y_test)}")
print(f"F1 Score of Random Forest on test set : {f1_score(y_pred, y_test, pos_label='anom')}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred_rf))

输出：

Accuracy of Random Forest on test set : 0.9806066633515664
F1 Score of Random Forest on test set : 0.9735234215885948

Classification Report:
              precision    recall  f1-score   support

        anom       1.00      0.96      0.98       732
        norm       0.98      1.00      0.99      1279

    accuracy                           0.99      2011
   macro avg       0.99      0.98      0.99      2011
weighted avg       0.99      0.99      0.99      2011

支持向量机

中：

svm_clf = SVC(gamma = 'auto')
svm_clf.fit(X_train, y_train)
y_pred = svm_clf.predict(X_test)
print(f"Accuracy of SVM on test set : {accuracy_score(y_pred, y_test)}")
print(f"F1 Score of SVM on test set: {f1_score(y_pred, y_test, pos_label='anom')}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

输出：

Accuracy of SVM on test set : 0.9189457981103928
F1 Score of SVM on test set: 0.8658436213991769

Classification Report:
              precision    recall  f1-score   support

        anom       1.00      0.76      0.87       689
        norm       0.89      1.00      0.94      1322

    accuracy                           0.92      2011
   macro avg       0.95      0.88      0.90      2011
weighted avg       0.93      0.92      0.92      2011

正如您所看到的，在使用不同的机器学习算法进行训练时，我们在随机森林和朴素贝叶斯分类器的情况下得到了相同的结果。我希望你能帮助我修复代码中可能存在的错误或以某种方式改进它。

Answer 1

在随机森林代码中，您将预测存储为

y_pred_rf

，但在

y_pred

上调用指标

机器学习后指标结果相同的问题

问题描述投票：0回答：1

连接的图书馆列表

加载数据集

加载、处理和分割数据以进一步训练分类模型

朴素贝叶斯分类器

随机森林算法：

支持向量机

1个回答

最新问题

机器学习后指标结果相同的问题

问题描述 投票：0回答：1

连接的图书馆列表

加载数据集

加载、处理和分割数据以进一步训练分类模型

朴素贝叶斯分类器

随机森林算法：

支持向量机

1个回答

最新问题

问题描述投票：0回答：1