获取功能名称时出现 get_features_name_out 错误

问题描述 投票:0回答:1

我想知道特征对我的数据的重要性,所以我使用 permutation_importance。当我得到结果时,似乎该功能已经解码,我想使用

get_features_name_out
知道我的功能的名称。它变成了一个错误
'StandardScaler' object has no attribute 'get_feature_names_out' 
。如果我尝试手动解释,恐怕顺序会出现错误。顺序应该是(3,0,1,2)。吸烟者、年龄、体重指数、性别。 这是代码

import numpy as np
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split
from sklearn.inspection import permutation_importance

# Prepare data
X = df[['age', 'bmi', 'sex', 'smoker']]
y = df['charges']

# Define the preprocessor
categorical_transformer = OneHotEncoder(drop='first', sparse=False)
numerical_transformer = StandardScaler()

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, ['age', 'bmi']),
        ('cat', categorical_transformer, ['sex', 'smoker'])
    ]
)

# Preprocess the data
X_preprocessed = preprocessor.fit_transform(X)

# Extract feature names
num_features = numerical_transformer.get_feature_names_out(['age', 'bmi'])
cat_features = categorical_transformer.get_feature_names_out(['sex', 'smoker'])
feature_names = np.concatenate([num_features, cat_features])

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X_preprocessed, y, test_size=0.2, random_state=42)

# Train KNeighborsRegressor
knn_regressor = KNeighborsRegressor()
reg_model = knn_regressor.fit(X_train, y_train)

# Evaluate feature importance using permutation importance
results = permutation_importance(knn_regressor, X_test, y_test, n_repeats=10, random_state=42, scoring='neg_mean_squared_error')

# Display feature importances with names
for i, importance in enumerate(results.importances_mean):
    print(f"Feature '{feature_names[i]}': Importance: {importance}")


sorted_indices = np.argsort(results.importances_mean)
for i in sorted_indices[::-1]:
    print(f"Feature '{feature_names[i]}', Importance: {results.importances_mean[i]}")

我想知道功能名称。也许可以解释为什么特征重要性的顺序不正确,因为我在费用与每个特征之间手动绘制了图,正确的顺序应该是吸烟者、年龄、体重指数、性别。

python pandas scikit-learn data-science
1个回答
0
投票

它不适用于提取器,因为您使用了预处理器(ColumnTransformer)来拟合和转换。您可以通过在 ColumnTransformer 中指定步骤来获取它们:

preprocessor["cat"].get_feature_names_out()
© www.soinside.com 2019 - 2024. All rights reserved.