我是Python新手,我正在尝试制作一个GUI,在地图上显示半小时时间范围内的预测降水量值。数据位于 txt 文件的文件夹中。每个文件包含每 15 分钟进行一次的测量,并包含经度、纬度和降水量值(每个文件中 270000 个位置)。为了对下一小时进行预测,我合并了最后 8 个文件,它们对应于两小时的时间范围,并添加了一个额外的列来记录测量时间。 这是数据集:
LAT LON Prec
time
11:15:00 45.980 10.029 0.42
11:15:00 45.981 10.071 0.47
11:15:00 45.982 10.113 0.32
11:15:00 45.984 10.155 0.30
11:15:00 45.985 10.197 0.32
... ... ... ...
13:00:00 29.299 33.978 0.00
13:00:00 29.302 34.021 0.00
13:00:00 29.304 34.064 0.00
13:00:00 29.306 34.107 0.00
13:00:00 29.309 34.150 0.00
我使用过线性回归、svr 和 lstm,但我的结果到处都是。例如,当我使用线性回归时,它就像没有捕获现象的运动,它与最后一小时的数据基本相同。 这是我的代码:
import pandas as pd
import cartopy.crs as ccrs
import matplotlib.ticker as ticker
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import numpy as np
from sklearn.linear_model import LinearRegression
import warnings
warnings.filterwarnings("ignore", message="X does not have valid feature names")
# Load data
df_comb = pd.read_csv('combined_files.txt', delim_whitespace=' ', dtype='unicode')
df_comb = df_comb.astype({
'LAT': float,
'LON': float,
'Prec': float,
'time': str
})
df_comb['Prec'] = df_comb['Prec'].clip(lower=0)
df_comb['time'] = pd.to_datetime(df_comb['time'], format='%H%M').dt.time
df_comb = df_comb.set_index('time')
# Create lagged features for each of the 5 previous intervals
for i in range(1, 8):
df_comb[f'Prec_lag{i}'] = df_comb['Prec'].shift(i)
# Drop the missing values introduced by lagging
df_comb = df_comb.dropna()
# Use the lagged features and location data for training
features = ['LAT', 'LON'] + [f'Prec_lag{i}' for i in range(1, 8)]
X_train = df_comb[features]
y_train = df_comb['Prec']
# Assign feature names to X_train
X_train.columns = features
# Create and fit the linear regression model
regression_model = LinearRegression()
regression_model.fit(X_train, y_train)
# List to store the predictions
predictions = []
# Iterate over the locations and make predictions
for location in df_comb.groupby(['LAT', 'LON']):
lat, lon = location[0]
location_data = location[1]
X_pred = location_data[features].iloc[-1].values.reshape(1, -1)
y_pred_next_hour = regression_model.predict(X_pred)
predictions.append({'LAT': lat, 'LON': lon, 'Predicted_Precipitation': y_pred_next_hour[0]})
# Create a DataFrame from the predictions list
predictions_df = pd.DataFrame(predictions)
predictions_df['Predicted_Precipitation'] = predictions_df['Predicted_Precipitation'].clip(lower=0)
# Plotting
vmin = 0
vmax = 20
fig = plt.figure(figsize=(6, 6))
ax = plt.axes(projection=ccrs.PlateCarree())
cs = ax.tricontourf(predictions_df['LON'], predictions_df['LAT'], predictions_df['Predicted_Precipitation'], vmin=vmin, vmax=vmax, locator=ticker.MaxNLocator(150),
origin='lower',
transform=ccrs.PlateCarree(), cmap='jet', extend='neither')
ax.coastlines(resolution='10m')
ax.add_feature(cfeature.BORDERS, linestyle=':')
cbar_vmax = np.max(predictions_df['Predicted_Precipitation'])
plt.colorbar(cs, shrink=0.5, extend='neither', ticks=np.linspace(vmin, cbar_vmax, num=7), format='%.1f')
plt.tight_layout()
plt.show()
13:00真实数据:
13.30的真实数据,这是我预测后的预期:
“预测”结果:
如何解决这个问题?我知道数据集对于这种类型的预测是有限的。但这就是我被要求做的。
您好,可以分享一下您用LSTM模型进行上述操作的源代码吗,非常感谢您