“发现输入变量的样本数量不一致”我在train_test_split期间做错了什么吗?

问题描述 投票:0回答:2

我正在尝试逻辑回归模型,并运行一些测试,但我不断收到此错误。不太确定我做了什么与其他人不同的事情

from sklearn import preprocessing
X = df.iloc[:,:len(df.columns)-1]
y = df.iloc[:,len(df.columns)-1]ere

这就是我分离列的方式

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

语音合成

logReg = LogisticRegression(n_jobs=-1)
logReg.fit(X_train, y_train)
y_pred = logReg.predict(X_train)
mae = mean_absolute_error(y_test, y_pred)
print("MAE:" , mae)
ValueError                                Traceback (most recent call last)
Cell In [112], line 1
----> 1 mae = mean_absolute_error(y_test, y_pred)
      2 print("MAE:" , mae)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_regression.py:196, in mean_absolute_error(y_true, y_pred, sample_weight, multioutput)
    141 def mean_absolute_error(
    142     y_true, y_pred, *, sample_weight=None, multioutput="uniform_average"
    143 ):
    144     """Mean absolute error regression loss.
    145 
    146     Read more in the :ref:`User Guide <mean_absolute_error>`.
   (...)
    194     0.85...
    195     """
--> 196     y_type, y_true, y_pred, multioutput = _check_reg_targets(
    197         y_true, y_pred, multioutput
    198     )
    199     check_consistent_length(y_true, y_pred, sample_weight)
    200     output_errors = np.average(np.abs(y_pred - y_true), weights=sample_weight, axis=0)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_regression.py:100, in _check_reg_targets(y_true, y_pred, multioutput, dtype)
     66 def _check_reg_targets(y_true, y_pred, multioutput, dtype="numeric"):
     67     """Check that y_true and y_pred belong to the same regression task.
     68 
     69     Parameters
   (...)
     98         correct keyword.
     99     """
--> 100     check_consistent_length(y_true, y_pred)
    101     y_true = check_array(y_true, ensure_2d=False, dtype=dtype)
    102     y_pred = check_array(y_pred, ensure_2d=False, dtype=dtype)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\utils\validation.py:387, in check_consistent_length(*arrays)
    385 uniques = np.unique(lengths)
    386 if len(uniques) > 1:
--> 387     raise ValueError(
    388         "Found input variables with inconsistent numbers of samples: %r"
    389         % [int(l) for l in lengths]
    390     )

ValueError: Found input variables with inconsistent numbers of samples: [25404, 101612]

我以为这是我分割列的方式,但这似乎不是问题 当测试大小为 50/50 时有效,但其他测试大小无效时

python machine-learning jupyter-notebook data-analysis train-test-split
2个回答
1
投票

您正在将训练集的预测标签与测试集的标签进行比较,它们的大小不同,因此会出现错误。

更换

y_pred = logReg.predict(X_train)

y_pred = logReg.predict(X_test)

0
投票

所以我遇到了这个确切的问题,我的问题是没有正确地将 train_test_split 方法解包到我的数据中。我用过:

X_train, y_train, X_test, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

而不是

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

所以我知道在你的情况下这可能不适用,但这适用于遇到此问题的任何人。

再次查看你的代码。

© www.soinside.com 2019 - 2024. All rights reserved.