我有一个用于作物产量预测的随机森林回归器,具有与回归器相关的 5 个特征
['Precipitation' ,'Min_Temp' ,'Cloud_Cover' ,'Vapour_pressure' ,'Area']
,对于给定的数据集,我的因变量是Production
,此代码给出了以下错误
x_grid = np.arange(min(indp),max(indp),0.01)
ValueError : The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
代码:
from numpy.core.fromnumeric import reshape
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.ensemble import RandomForestRegressor
df = pd.read_csv('dataset.csv')
X = df[['Precipitation' ,'Min_Temp' ,'Cloud_Cover' ,'Vapour_pressure' ,'Area']]
Y = df['Production']
def Preprocessing(df,drop_indx_list):
for indx in drop_indx_list:
df=df.drop(indx)
df.reset_index()
df.loc[df.Min_Temp>40,'Min_Temp']=df.Min_Temp.mean()
return df
def DataVisualization(chart_list):
colors = ['blue','red','green','black','yellow']
c = 0
for chrt in chart_list:
t1,x,t2 = chrt.split()
plt.scatter(df[t1],df[t2],color=colors[c])
plt.title(chrt)
plt.xlabel(t1)
plt.ylabel(t2)
plt.grid()
plt.show()
c+=1
drop_indx_list = [4]
df = Preprocessing(df,drop_indx_list)
viusal_list = ['Precipitation Vs Production','Min_Temp Vs Production','Cloud_Cover Vs Production',
'Vapour_pressure Vs Production','Area Vs Production']
x_train,x_test,y_train,y_test=train_test_split(X,Y, test_size=0.2, random_state=1)
reg=linear_model.LinearRegression()
reg.fit(x_train,y_train)
# prediction
y_pred=reg.predict(x_test)
for i in range(len(y_pred)):
print('data point number : ',i," prediction : ",y_pred[i],'\n')
# Coefficients
print('\nCoefficients: ', reg.coef_,'\n')
# R-squared score
print('\nR-squared score: ', r2_score(y_test,y_pred),'\n')
# DataVisualization(viusal_list)
# Random forest
indp = df.iloc[:,1:6].values
dep = df.iloc[:,6].values
reg = RandomForestRegressor(n_estimators=10,random_state=0)
reg.fit(indp,dep)
y_pred_reg = reg.predict(indp) #fix1
x_grid = np.arange(min(indp),max(indp),0.01) # current error
x_grid = x_grid.reshape((len(x_grid),1))
plt.scatter(indp,dep,color = 'red')
plt.plot(x_grid,reg.predict(x_grid),color ='blue')
plt.title('Random forest regression')
plt.xlabel('X-axis')
plt.ylabel('Prodution')
数据集:
Dist,Precipitation,Min_Temp,Cloud_Cover,Vapour_pressure,Area,Production
Bidar,622.438,27.643,35.241,17.953,4709,9043
Bangalore,748.194,25.263,49.134,21.56,18790,20981
Belgaum,1334.194,21.254,39.728,22.5509,4398,6054
Bellary,574.325,26.407,38.466,20.008,3768,5903
Bengalore Rural,733.003,25.228000,47.620000,21.241000,140213,534214
Kolar,724.545,25.464,47.029,20.63,2278,2759
Dharwad,1623.548,26.148,38.267,23.652,8395,10986
Koppal,724.545,26.871005,41.039,19.992,3084,3952
Chikmagalur,1923.742,26.459,44.842,24.717,1650,2958
Chitradurga,674.17,25.214,41.364,20.82,3026,3325
Haveri,1473.343,25.817,41.292,23.168,10659,9865
Chamrajanagar,1334.754,25.089,50.77,23.079,3485,4120
Mandya,1477.249,24.567,49.54775,22.234,11349,18957
Mysore,2242.378,25.76766667,50.57941667,24.64266667,3462,4539
Raichur,450.113,27.42241667,35.76258333,18.93741667,4586,6145
Kodaku,1691.933,25.426,46.353,23.975,17856,15362
Hassan,2200.349,25.348,47.12,24.008,10487,7586
Devanagare,1060.343,25.509,40.929,22.042,2459,1865
Gulbarga,525.402,27.851,35.109,18.662,10487,7895
我尝试了这个,但没有帮助类似查询链接
编辑:reg.predict(indp) 修正而不是 [[6.5]]
但是现在
np.arange(min(indp),max(indp),0.01)
行给了我另一个错误:具有多个元素的数组的真值不明确。使用 a.any() 或 a.all()
使用 - np.arange(np.min(indp),np.max(indp),0.1)