当改变列序时,线性模型收敛性发生变化。

问题描述 投票:0回答:1

当我改变了列的顺序(特征顺序),在 SciKit 线性模型与正则化,我得到了不同的分数。我已经用 ElasticNetLasso. 我正在使用 scikit-learn==0.23.1

import pandas as pd
import numpy as np
from sklearn import linear_model
from sklearn import metrics

df = pd.DataFrame({
    'col1': [1, 2, 3, 4, 5, 6],
    'col2': [16, 32, 64, 12, 5, 256],
    'col3': [7, 8, 9, 10, 12, 11],
    'out': [40, 5, 60, 7, 9, 100]})

print(df)
X_df = df[['col1', 'col2', 'col3']]
y_df = df['out']
regr = linear_model.ElasticNet(alpha=0.1, random_state=0)

regr.fit(X_df, y_df)
y_pred = regr.predict(X_df)
print("R2:", regr.score(X_df, y_df))
print("MSE:", metrics.mean_squared_error(y_df, y_pred))

# change the order to: [col2, col1, col3]
first_cols = ['col2']
cols = first_cols.copy()
for c in X_df.columns:
    if c not in cols:
        cols.append(c)
X_df = X_df[cols]

regr.fit(X_df, y_df)
y_pred = regr.predict(X_df)
print("\nReorder:")
print("R2:", regr.score(X_df, y_df))
print("MSE:", metrics.mean_squared_error(y_df, y_pred))

以上的输出是。

col1  col2  col3  out
0     1    16     7   40
1     2    32     8    5
2     3    64     9   60
3     4    12    10    7
4     5     5    12    9
5     6   256    11  100
R2: 0.8277462579081043
MSE: 207.13034003933535

Reorder:
R2: 0.8277586094134455
MSE: 207.11548769725997

为什么会出现这种情况?

python machine-learning scikit-learn linear-regression
1个回答
1
投票

差异是由于 tol param.

来自文档。

tol : float, default=1e-4

优化的容忍度:如果更新量小于 tol,优化代码检查双间隙的最优性,并继续检查,直到它小于 tol.

只需添加得到你想要的精度等级添加。tol=1e-12 在这两种情况下。

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.compose import make_column_transformer
from sklearn.feature_extraction.text import CountVectorizer

import pandas as pd
import numpy as np
from sklearn import linear_model
from sklearn import metrics

df = pd.DataFrame({
    'col1': [1, 2, 3, 4, 5, 6],
    'col2': [16, 32, 64, 12, 5, 256],
    'col3': [7, 8, 9, 10, 12, 11],
    'out': [40, 5, 60, 7, 9, 100]})

# print(df)
X_df = df[['col1', 'col2', 'col3']]
y_df = df['out']
regr = linear_model.ElasticNet(alpha=0.1, random_state=0, tol=1e-12)

regr.fit(X_df, y_df)
y_pred = regr.predict(X_df)
print(regr.coef_)
print("R2:", regr.score(X_df, y_df))
print("MSE:", metrics.mean_squared_error(y_df, y_pred))

# change the order to: [col2, col1, col3]
first_cols = ['col2']
cols = first_cols.copy()
for c in X_df.columns:
    if c not in cols:
        cols.append(c)
X_df = X_df[cols]


regr = linear_model.ElasticNet(alpha=0.1, random_state=0, tol=1e-12)
regr.fit(X_df, y_df)
y_pred = regr.predict(X_df)
print("\nReorder:")
print(regr.coef_)
print("R2:", regr.score(X_df, y_df))
print("MSE:", metrics.mean_squared_error(y_df, y_pred))
[-8.92519779  0.42980208  3.59812779]
R2: 0.8277593357239204
MSE: 207.11461432908925

Reorder:
[ 0.42980208 -8.92519779  3.59812779]
R2: 0.8277593357240851
MSE: 207.11461432889112

-1
投票

改变列序会影响训练时的运算顺序。在理想的情况下,这不会有什么影响,但是由于浮点数的精度损失,你可能会因为改变列序而得到稍微不同的值。

© www.soinside.com 2019 - 2024. All rights reserved.