我正在使用XGBoost来获取功能重要性,我想选择赋予我90%重要性的功能,所以起初我会建立一个Dataframe,因为我需要将其用于excel,然后编写一段时间来评估这些功能这给了我90%的重要性。在此之后,有一个神经网络(但不在下面的代码中)。我知道也许有一些最简单的方法可以做到这一点,但这给了我一个错误:
ValueError: could not convert string to float: '0,25691372'
代码是
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.feature_selection import SelectFromModel
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
from matplotlib import pyplot as plt
dataset = pd.read_csv('CompleteDataSet_original_Clean_CONC.csv', decimal=',', delimiter = ";")
from sklearn.metrics import r2_score
label = dataset.iloc[:,-1]
features = dataset.drop(columns = ['Label'])
y_max_pre_normalize = max(label)
y_min_pre_normalize = min(label)
def denormalize(y):
final_value = y*(y_max_pre_normalize-y_min_pre_normalize)+y_min_pre_normalize
return final_value
X_train1, X_test1, y_train1, y_test1 = train_test_split(features, label, test_size = 0.20, random_state = 1, shuffle = True)
y_test2 = y_test1.to_frame()
y_train2 = y_train1.to_frame()
scaler1 = preprocessing.MinMaxScaler()
scaler2 = preprocessing.MinMaxScaler()
X_train = scaler1.fit_transform(X_train1)
X_test = scaler2.fit_transform(X_test1)
scaler3 = preprocessing.MinMaxScaler()
scaler4 = preprocessing.MinMaxScaler()
y_train = scaler3.fit_transform(y_train2)
y_test = scaler4.fit_transform(y_test2)
sel = XGBRegressor(colsample_bytree= 0.7, learning_rate = 0.005, max_depth = 5, min_child_weight = 3, n_estimators = 1000)
sel.fit(X_train, y_train)
importances = sel.feature_importances_
importances = [str(i) for i in importances]
importances = [i.replace(".", ",") for i in importances]
df1 = pd.DataFrame(features.columns)
df1.columns = ['Features']
df2 = pd.DataFrame(importances)
df2.columns = ['Importances [%]']
result = pd.concat([df1,df2],axis = 1)
result = result.sort_values(by='Importances [%]', ascending=False)
result.to_excel("Feature_Results.xlsx")
i = 0
somma = 0
feature = []
while somma <=0.9:
a = result.iloc[i,-1]
somma = float(a) + somma
feature.append(result.iloc[i,-2])
i = i + 1
我正在使用XGBoost来实现功能重要性,我想选择赋予我90%重要性的功能,所以起初我会建立一个Dataframe,因为我需要将它用于excel,然后编写一个while周期...] >
尝试将“ 0,0001”转换为“ 0.0001”,然后将字符串转换为浮点数。
float('0,25691372'.replace(",", "."))
您可以使用locale.atof()
处理用作小数点分隔符的locale.atof()
。