我正在尝试逻辑回归模型,并运行一些测试,但我不断收到此错误。不太确定我做了什么与其他人不同的事情
from sklearn import preprocessing
X = df.iloc[:,:len(df.columns)-1]
y = df.iloc[:,len(df.columns)-1]ere
这就是我分离列的方式
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
语音合成
logReg = LogisticRegression(n_jobs=-1)
logReg.fit(X_train, y_train)
y_pred = logReg.predict(X_train)
mae = mean_absolute_error(y_test, y_pred)
print("MAE:" , mae)
ValueError Traceback (most recent call last)
Cell In [112], line 1
----> 1 mae = mean_absolute_error(y_test, y_pred)
2 print("MAE:" , mae)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_regression.py:196, in mean_absolute_error(y_true, y_pred, sample_weight, multioutput)
141 def mean_absolute_error(
142 y_true, y_pred, *, sample_weight=None, multioutput="uniform_average"
143 ):
144 """Mean absolute error regression loss.
145
146 Read more in the :ref:`User Guide <mean_absolute_error>`.
(...)
194 0.85...
195 """
--> 196 y_type, y_true, y_pred, multioutput = _check_reg_targets(
197 y_true, y_pred, multioutput
198 )
199 check_consistent_length(y_true, y_pred, sample_weight)
200 output_errors = np.average(np.abs(y_pred - y_true), weights=sample_weight, axis=0)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_regression.py:100, in _check_reg_targets(y_true, y_pred, multioutput, dtype)
66 def _check_reg_targets(y_true, y_pred, multioutput, dtype="numeric"):
67 """Check that y_true and y_pred belong to the same regression task.
68
69 Parameters
(...)
98 correct keyword.
99 """
--> 100 check_consistent_length(y_true, y_pred)
101 y_true = check_array(y_true, ensure_2d=False, dtype=dtype)
102 y_pred = check_array(y_pred, ensure_2d=False, dtype=dtype)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\utils\validation.py:387, in check_consistent_length(*arrays)
385 uniques = np.unique(lengths)
386 if len(uniques) > 1:
--> 387 raise ValueError(
388 "Found input variables with inconsistent numbers of samples: %r"
389 % [int(l) for l in lengths]
390 )
ValueError: Found input variables with inconsistent numbers of samples: [25404, 101612]
我以为这是我分割列的方式,但这似乎不是问题 当测试大小为 50/50 时有效,但其他测试大小无效时
您正在将训练集的预测标签与测试集的标签进行比较,它们的大小不同,因此会出现错误。
更换
y_pred = logReg.predict(X_train)
与
y_pred = logReg.predict(X_test)
所以我遇到了这个确切的问题,我的问题是没有正确地将 train_test_split 方法解包到我的数据中。我用过:
X_train, y_train, X_test, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
而不是
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
所以我知道在你的情况下这可能不适用,但这适用于遇到此问题的任何人。
再次查看你的代码。