ValuerError:发现输入变量的样本数量不一致

问题描述 投票:0回答:1

我编写了以下代码来学习机器学习方法中的分数。但我收到以下错误。会是什么原因呢??

ValueError:发现输入变量的样本数量不一致:[6396, 1599]

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_csv('Armenian Market Car Prices.csv')

df['Car Name'] = df['Car Name'].astype('category').cat.codes

df = df.join(pd.get_dummies(df.FuelType, dtype=int))
df = df.drop('FuelType', axis=1)

df['Region'] = df['Region'].astype('category').cat.codes

df['Price'] = df.pop('Price')

X = df.drop('Price', axis=1)
y = df['Price']

df

[1]: https://i.sstatic.net/vTKsYZro.jpg


from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression


X_train, y_train, X_test, y_test = train_test_split(X, y, test_size=0.2)
model = LinearRegression()


model.fit(X_train, y_train)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[358], line 1
----> 1 model.fit(X_train, y_train)

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\sklearn\base.py:1473, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs)
   1466     estimator._validate_params()
   1468 with config_context(
   1469     skip_parameter_validation=(
   1470         prefer_skip_nested_validation or global_skip_validation
   1471     )
   1472 ):
-> 1473     return fit_method(estimator, *args, **kwargs)

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\sklearn\linear_model\_base.py:609, in LinearRegression.fit(self, X, y, sample_weight)
    605 n_jobs_ = self.n_jobs
    607 accept_sparse = False if self.positive else ["csr", "csc", "coo"]
--> 609 X, y = self._validate_data(
    610     X, y, accept_sparse=accept_sparse, y_numeric=True, multi_output=True
    611 )
    613 has_sw = sample_weight is not None
    614 if has_sw:

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\sklearn\base.py:650, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, cast_to_ndarray, **check_params)
    648         y = check_array(y, input_name="y", **check_y_params)
    649     else:
--> 650         X, y = check_X_y(X, y, **check_params)
    651     out = X, y
    653 if not no_val_X and check_params.get("ensure_2d", True):

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\sklearn\utils\validation.py:1291, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
   1273 X = check_array(
   1274     X,
   1275     accept_sparse=accept_sparse,
   (...)
   1286     input_name="X",
   1287 )
   1289 y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric, estimator=estimator)
-> 1291 check_consistent_length(X, y)
   1293 return X, y

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\sklearn\utils\validation.py:460, in check_consistent_length(*arrays)
    458 uniques = np.unique(lengths)
    459 if len(uniques) > 1:
--> 460     raise ValueError(
    461         "Found input variables with inconsistent numbers of samples: %r"
    462         % [int(l) for l in lengths]
    463     )

ValueError: Found input variables with inconsistent numbers of samples: [6396, 1599]






我尝试了一切,但没有任何效果,或者我不知道如何解决问题。

Jupyternaut

您提供的错误信息表明输入数据有问题。具体来说,输入数据似乎有两种不同的版本,一种有 6396 个样本,另一种有 1599 个样本。当尝试拟合模型或对数据执行其他操作时,这可能会导致问题。

要解决此问题,您可能需要检查代码并确保每个操作使用正确版本的输入数据。您可能还想尝试通过删除任何重复或不一致的内容来清理输入数据。

machine-learning linear-regression jupyter-lab
1个回答
0
投票

您的代码中

train_test_split
的输出顺序不正确。这是正确的代码:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
© www.soinside.com 2019 - 2024. All rights reserved.