我真的坚持这个问题。我正在尝试使用OneHotEncoder在使用LabelEncoder之后将我的数据编码为矩阵但是得到了这个错误:预期的2D数组,而是获得了1D数组。
在错误消息的末尾(包含在下面),它说“重塑我的数据”我认为我做了但它仍然无法正常工作。如果我理解重塑,那就是当你想要将一些数据重新塑造成不同的矩阵大小时?例如,如果我想将3 x 2矩阵更改为4 x 6?
我的代码在这两行上失败了:
X = X.reshape(-1, 1) # I added this after I saw the error
X[:, 0] = onehotencoder1.fit_transform(X[:, 0]).toarray()
这是我到目前为止的代码:
# Data Preprocessing
# Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Import Dataset
dataset = pd.read_csv('Data2.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 5].values
df_X = pd.DataFrame(X)
df_y = pd.DataFrame(y)
# Replace Missing Values
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 3:5 ])
X[:, 3:5] = imputer.transform(X[:, 3:5])
# Encoding Categorical Data "Name"
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x = LabelEncoder()
X[:, 0] = labelencoder_x.fit_transform(X[:, 0])
# Transform into a Matrix
onehotencoder1 = OneHotEncoder(categorical_features = [0])
X = X.reshape(-1, 1)
X[:, 0] = onehotencoder1.fit_transform(X[:, 0]).toarray()
# Encoding Categorical Data "University"
from sklearn.preprocessing import LabelEncoder
labelencoder_x1 = LabelEncoder()
X[:, 1] = labelencoder_x1.fit_transform(X[:, 1])
这是完整的错误消息:
File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 1809, in _transform_selected
X = check_array(X, accept_sparse='csc', copy=copy, dtype=FLOAT_DTYPES)
File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 441, in check_array
"if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[ 2.00000000e+00 7.00000000e+00 3.20000000e+00 2.70000000e+01
2.30000000e+03 1.00000000e+00 6.00000000e+00 3.90000000e+00
2.80000000e+01 2.90000000e+03 3.00000000e+00 4.00000000e+00
4.00000000e+00 3.00000000e+01 2.76700000e+03 2.00000000e+00
8.00000000e+00 3.20000000e+00 2.70000000e+01 2.30000000e+03
3.00000000e+00 0.00000000e+00 4.00000000e+00 3.00000000e+01
2.48522222e+03 5.00000000e+00 9.00000000e+00 3.50000000e+00
2.50000000e+01 2.50000000e+03 5.00000000e+00 1.00000000e+00
3.50000000e+00 2.50000000e+01 2.50000000e+03 0.00000000e+00
2.00000000e+00 3.00000000e+00 2.90000000e+01 2.40000000e+03
4.00000000e+00 3.00000000e+00 3.70000000e+00 2.77777778e+01
2.30000000e+03 0.00000000e+00 5.00000000e+00 3.00000000e+00
2.90000000e+01 2.40000000e+03].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
任何帮助都会很棒。
尝试将代码更改为此
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Import Dataset
dataset = pd.read_csv('Data2.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 5].values
df_X = pd.DataFrame(X)
df_y = pd.DataFrame(y)
# Replace Missing Values
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 3:5 ])
X[:, 3:5] = imputer.transform(X[:, 3:5])
# Encoding Categorical Data "Name"
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x = LabelEncoder()
X[:, 0] = labelencoder_x.fit_transform(X[:, 0])
# Transform into a Matrix
onehotencoder1 = OneHotEncoder(categorical_features = [0])
res_0 = onehotencoder1.fit_transform(X[:, 0].reshape(-1, 1)) # <=== Change
X[:, 0] = res_0.ravel()
# Encoding Categorical Data "University"
from sklearn.preprocessing import LabelEncoder
labelencoder_x1 = LabelEncoder()
X[:, 1] = labelencoder_x1.fit_transform(X[:, 1])
如果你在labelencoder_x1.fit_transform(X[:, 1])
得到错误,那就把它变成labelencoder_x1.fit_transform(X[:, 1].reshape(-1, 1))
好的,我终于让代码工作了。请参阅以下解决方案:
# Data Preprocessing
# Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Import Dataset
dataset = pd.read_csv('Data2.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 5].values
df_X = pd.DataFrame(X)
df_y = pd.DataFrame(y)
# Replace Missing Values
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 3:5 ])
X[:, 3:5] = imputer.transform(X[:, 3:5])
# Encoding Categorical Data "Name"
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x = LabelEncoder()
X[:, 0] = labelencoder_x.fit_transform(X[:, 0])
# Encoding Categorical Data "University"
from sklearn.preprocessing import LabelEncoder
labelencoder_x1 = LabelEncoder()
X[:, 1] = labelencoder_x1.fit_transform(X[:, 1])
# Transform Name into a Matrix
onehotencoder1 = OneHotEncoder(categorical_features = [0])
X = onehotencoder1.fit_transform(X).toarray()
# Transform University into a Matrix
onehotencoder2 = OneHotEncoder(categorical_features = [6])
X = onehotencoder2.fit_transform(X).toarray()