尝试拟合机器学习模型时出现错误 AttributeError: 'bool' 对象没有属性 'transpose'

问题描述 投票:0回答:1

我正在尝试创建一个机器学习模型来预测谁会在泰坦尼克号上幸存。每次我尝试拟合我的模型时,都会收到此错误:

    coordinates = np.where(mask.transpose())[::-1]
AttributeError: 'bool' object has no attribute 'transpose'

我正在运行的代码如下:


from xgboost import XGBClassifier
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.feature_selection import SelectFromModel
from itertools import combinations
import pandas as pd 
import numpy as np

#read in data
training_data = pd.read_csv('train.csv')
testing_data = pd.read_csv('test.csv')




#seperate X and Y
X_train_full = training_data.copy()
y = X_train_full.Survived
X_train_full.drop(['Survived'], axis=1, inplace=True)

y_test = testing_data

#get all str columns
cat_columns1 = [cname for cname in X_train_full.columns if
                    X_train_full[cname].dtype == "object"]

interactions = pd.DataFrame(index= X_train_full)

#create new features
for combination in combinations(cat_columns1,2):
    imputer = SimpleImputer(strategy='constant')

    new_col_name = '_'.join(combination)
    col1 = X_train_full[combination[0]]
    col2 = X_train_full[combination[1]]
    col1 = np.array(col1).reshape(-1,1)
    col2 = np.array(col2).reshape(-1,1)
    col1 = imputer.fit_transform(col1)
    col2 = imputer.fit_transform(col2)


    new_vals = col1 + '_' + col2
    OneHot = OneHotEncoder()




    interactions[new_col_name] = OneHot.fit_transform(new_vals)
 

interactions = interactions.reset_index(drop = True)


#create new dataframe with new features included
new_df = X_train_full.join(interactions)
 

#do the same for the test file
interactions2 = pd.DataFrame(index= y_test)
for combination in combinations(cat_columns1,2):
    imputer = SimpleImputer(strategy='constant')

    new_col_name = '_'.join(combination)
    col1 = y_test[combination[0]]
    col2 = y_test[combination[1]]
    col1 = np.array(col1).reshape(-1,1)
    col2 = np.array(col2).reshape(-1,1)
    col1 = imputer.fit_transform(col1)
    col2 = imputer.fit_transform(col2)


    new_vals = col1 + '_' + col2

    OneHot = OneHotEncoder()




    interactions2[new_col_name] = OneHot.fit_transform(new_vals)


    interactions2[new_col_name] = new_vals
 

interactions2 = interactions2.reset_index(drop = True)
y_test = y_test.join(interactions2)


#get names of cat columns (with new features added)
cat_columns = [cname for cname in new_df.columns if
                    new_df[cname].dtype == "object"]

# Select numerical columns
num_columns = [cname for cname in new_df.columns if 
                new_df[cname].dtype in ['int64', 'float64']]



#set up pipeline
numerical_transformer = SimpleImputer(strategy = 'constant')


categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])


preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, num_columns),
        ('cat', categorical_transformer, cat_columns)
    ])
model = XGBClassifier()

my_pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                              ('model', model)
                             ])
#fit model
my_pipeline.fit(new_df,y)

我正在阅读的 csv 文件可通过以下链接从 Kaggle 获取:

https://www.kaggle.com/c/titanic/data

我无法弄清楚是什么导致了这个问题。任何帮助将不胜感激。

python pandas machine-learning scikit-learn model
1个回答
9
投票

发生这种情况可能是因为您的数据包含

pd.NA
值。
pd.NA
在 pandas 1.0.0 中引入,但仍标记为实验性。

SimpleImputer
最终将运行
data == np.nan
,它通常会返回一个numpy数组。相反,当
data
包含
pd.NA
值时,它返回单个布尔标量。

一个例子:

import pandas as pd
import numpy as np

test_pd_na = pd.DataFrame({"A": [1, 2, 3, pd.NA]})
test_np_nan = pd.DataFrame({"A": [1, 2, 3, np.nan]})

test_np_nan.to_numpy() == np.nan:
> array([[False],
       [False],
       [False],
       [False]])

test_pd_na.to_numpy() == np.nan

> False

解决方案是在运行

pd.NA
之前将所有
np.nan
值转换为
SimpleImputer
。为此,您可以在数据框上使用
.replace({pd.NA: np.nan})
。缺点显然是您失去了
pd.NA
带来的好处,例如缺少数据的整数列,而不是将这些列转换为浮点列。

© www.soinside.com 2019 - 2024. All rights reserved.