如何使用columntransformer进行分类数据? 我正在尝试预处理数据。 data = {'country':['德国','土耳其',“英国”,“土耳其”,“德国”,“土耳其”], '年龄':['44','32','27','29','31','25'], '薪水':['5400','85 ...

问题描述 投票:0回答:3

|---|---|---|----|-------|---| | 1 | 0 | 0 | 44 | 5400 | 1 | | 0 | 1 | 0 | 32 | 8500 | 1 | | 0 | 0 | 1 | 27 | 7200 | 0 | | 0 | 1 | 0 | 29 | 4800 | 1 | | 1 | 0 | 0 | 31 | 6200 | 0 | | 0 | 1 | 0 | 25 | 10850 | 1 |

the是失败的代码。
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([("city_category", OneHotEncoder(dtype='int'), [0])], remainder="passthrough")
X = ct.fit_transform(X)

输出:

IndexError: tuple index out of range
我想学习如何在这种情况下使用columntransformer函数?

无需Sklearn,您可以用大熊猫来做到这一点:

import pandas as pd data = { "Country": ["Germany", "Turkey", "England", "Turkey", "Germany", "Turkey"], "Age": ["44", "32", "27", "29", "31", "25"], "Salary": ["5400", "8500", "7200", "4800", "6200", "10850"], "Purchased": ["yes", "yes", "no", "yes", "no", "yes"], } df = pd.DataFrame(data) df = pd.concat([pd.get_dummies(df["Country"]), df.drop("Country", axis=1)], axis=1) df[["Age", "Salary"]] = df[["Age", "Salary"]].astype(int) df["Purchased"] = df["Purchased"].map(lambda x: x == "yes").astype(int) print(df.head())
输出为:

England Germany Turkey Age Salary Purchased 0 0 1 0 44 5400 1 1 0 0 1 32 8500 1 2 1 0 0 27 7200 0 3 0 0 1 29 4800 1 4 0 1 0 31 6200 0
python scikit-learn
3个回答
1
投票

X_transformer = ColumnTransformer( transformers=[ ("Country", # Just a name OneHotEncoder(), # The transformer class [0] # The column(s) to be applied on. ) ], remainder='passthrough' ) X = X_transformer.fit_transform(X) print(X)
    

columnTransFormer(Sklearn Developer在此处)很容易出错。 这些天我经常使用的更简单的选项是

Skrub的tablevectorizer

0
投票
IT基于列的类型应用默认选择,默认情况下,它适用于单次编码(除非它们具有很高的基数)

0
投票

最新问题
© www.soinside.com 2019 - 2025. All rights reserved.