我有一个看起来像这样的数据框
+-----+------------+-------------+-------------------------+----+----------+----------+
| | Actual_Lat | Actual_Long | Time | ID | Cal_long | Cal_lat |
+-----+------------+-------------+-------------------------+----+----------+----------+
| 0 | 63.433376 | 10.397068 | 2019-09-30 04:48:13.540 | 11 | 10.39729 | 63.43338 |
| 1 | 63.433301 | 10.395846 | 2019-09-30 04:48:18.470 | 11 | 10.39731 | 63.43326 |
| 2 | 63.433259 | 10.394543 | 2019-09-30 04:48:23.450 | 11 | 10.39576 | 63.43323 |
| 3 | 63.433258 | 10.394244 | 2019-09-30 04:48:29.500 | 11 | 10.39555 | 63.43436 |
| 4 | 63.433258 | 10.394215 | 2019-09-30 04:48:35.683 | 11 | 10.39505 | 63.43427 |
| ... | ... | ... | ... | ...| ... | ... |
| 70 | NaN | NaN | NaT | NaN| 10.35826 | 63.43149 |
| 71 | NaN | NaN | NaT | NaN| 10.35809 | 63.43155 |
| 72 | NaN | NaN | NaT | NaN| 10.35772 | 63.43163 |
| 73 | NaN | NaN | NaT | NaN| 10.35646 | 63.43182 |
| 74 | NaN | NaN | NaT | NaN| 10.35536 | 63.43196 |
+-----+------------+-------------+-------------------------+----------+----------+----------+
Actual_lat
和Actual_long
包含从GPS设备获得的数据的GPS坐标。 Cal_lat
和cal_lat
是从OSRM's API
获得的GPS坐标。如您所见,实际坐标中缺少很多数据。我正在寻找一个数据集,以便当我取Actual_lat与cal_lat之差时,它应为零或至少接近零。我试图用目标纬度和经度填充这些缺失的值,但这将导致巨大的差异。我的问题是如何使用python / pandas填充这些值,以便在车辆遵循OSRM估算路径时,实际经纬度与经纬度估算值之间的差应为零或接近零。我不熟悉GIS数据集,也不知道如何处理它们。
EDIT:我正在寻找类似的东西。
+-----+------------+-------------+-------------------------+----------+----------+----------+----------------------+----------------------+
| | Actual_Lat | Actual_Long | Time | Tour ID | Cal_long | Cal_lat | coordinates_diff_Lat | coordinates_diff_Lon |
+-----+------------+-------------+-------------------------+----------+----------+----------+----------------------+----------------------+
| 0 | 63.433376 | 10.397068 | 2019-09-30 04:48:13.540 | 11 | 10.39729 | 63.43338 | -0.000 | -0.000 |
| 1 | 63.433301 | 10.395846 | 2019-09-30 04:48:18.470 | 11 | 10.39731 | 63.43326 | 0.000 | -0.001 |
| 2 | 63.433259 | 10.394543 | 2019-09-30 04:48:23.450 | 11 | 10.39576 | 63.43323 | 0.000 | -0.001 |
| 3 | 63.433258 | 10.394244 | 2019-09-30 04:48:29.500 | 11 | 10.39555 | 63.43436 | -0.001 | -0.001 |
| 4 | 63.433258 | 10.394215 | 2019-09-30 04:48:35.683 | 11 | 10.39505 | 63.43427 | -0.001 | -0.001 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 70 | 63.43000 | 10.35800 | NaT | 115268.0 | 10.35826 | 63.43149 | 0.000 | -0.003 |
| 71 | 63.43025 | 10.35888 | NaT | 115268.0 | 10.35809 | 63.43155 | 0.000 | -0.003 |
| 72 | 63.43052 | 10.35713 | NaT | 115268.0 | 10.35772 | 63.43163 | 0.000 | -0.002 |
| 73 | 63.43159 | 10.35633 | NaT | 115268.0 | 10.35646 | 63.43182 | 0.000 | -0.001 |
| 74 | 63.43197 | 10.35537 | NaT | 115268.0 | 10.35536 | 63.43196 | 0.000 | 0.000 |
+-----+------------+-------------+-------------------------+----------+----------+----------+----------------------+----------------------+
请注意,63.43197,10.35537
是目的地,63.433376,10.397068
是起始位置。所有这些点都代表道路坐标。
假设您的数据帧为df
,那么您可以这样做:
df.Actual_Lat = df.Actual_Lat.where(~df.Actual_Lat.isna(), df.Cal_lat)
我认为您可以对Actual_Lat
和Cal_lat
,Actual_Long
和Cal_long
之间的差取平均,并使用这些平均值来估计与NaN值相对应的当前值。使用平均值,您可以看到平均误差有多少,并将其推断为其余数据
我将用来帮助您:
df
Actual_Lat Actual_Long Time ID Cal_long Cal_lat
0 63.433376 10.397068 2019-09-30-04:48:13.540 11.0 10.39729 63.43338
1 63.433301 10.395846 2019-09-30-04:48:18.470 11.0 10.39731 63.43326
2 63.433259 10.394543 2019-09-30-04:48:23.450 11.0 10.39576 63.43323
3 63.433258 10.394244 2019-09-30-04:48:29.500 11.0 10.39555 63.43436
4 63.433258 10.394215 2019-09-30-04:48:35.683 11.0 10.39505 63.43427
5 NaN NaN NaT NaN 10.35826 63.43149
6 NaN NaN NaT NaN 10.35809 63.43155
7 NaN NaN NaT NaN 10.35772 63.43163
8 NaN NaN NaT NaN 10.35646 63.43182
9 NaN NaN NaT NaN 10.35536 63.43196
您可以复制此数据框以使用pd.read_clipboard检查结果
1计算差异
df['coordinates_diff_Lat']=df['Actual_Lat']-df['Cal_lat']
df['coordinates_diff_Long']=df['Actual_Long']-df['Cal_long']
print(df)
Actual_Lat Actual_Long Time ID Cal_long Cal_lat \
0 63.433376 10.397068 2019-09-30-04:48:13.540 11.0 10.39729 63.43338
1 63.433301 10.395846 2019-09-30-04:48:18.470 11.0 10.39731 63.43326
2 63.433259 10.394543 2019-09-30-04:48:23.450 11.0 10.39576 63.43323
3 63.433258 10.394244 2019-09-30-04:48:29.500 11.0 10.39555 63.43436
4 63.433258 10.394215 2019-09-30-04:48:35.683 11.0 10.39505 63.43427
5 NaN NaN NaT NaN 10.35826 63.43149
6 NaN NaN NaT NaN 10.35809 63.43155
7 NaN NaN NaT NaN 10.35772 63.43163
8 NaN NaN NaT NaN 10.35646 63.43182
9 NaN NaN NaT NaN 10.35536 63.43196
coordinates_diff_Lat coordinates_diff_Long
0 -0.000004 -0.000222
1 0.000041 -0.001464
2 0.000029 -0.001217
3 -0.001102 -0.001306
4 -0.001012 -0.000835
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 NaN NaN
2。计算这些差异的平均值。要忽略NaN值,请使用掩码:
mask=df['coordinates_diff_Lat'].notna()
mean_lat=df.loc[mask,'coordinates_diff_Lat'].mean()
mean_long=df.loc[mask,'coordinates_diff_Long'].mean()
3。用这个平均值估计NaN值并用fillna填充它们:
df['Actual_Lat'].fillna(df['Cal_lat']+mean_lat,inplace=True)
df['Actual_Long'].fillna(df['Cal_long']+mean_long,inplace=True)
4。更新差异列并显示新的数据框:
df['coordinates_diff_Lat']=df['Actual_Lat']-df['Cal_lat']
df['coordinates_diff_Long']=df['Actual_Long']-df['Cal_long']
print(df)
Actual_Lat Actual_Long Time ID Cal_long Cal_lat \
0 63.433376 10.397068 2019-09-30-04:48:13.540 11.0 10.39729 63.43338
1 63.433301 10.395846 2019-09-30-04:48:18.470 11.0 10.39731 63.43326
2 63.433259 10.394543 2019-09-30-04:48:23.450 11.0 10.39576 63.43323
3 63.433258 10.394244 2019-09-30-04:48:29.500 11.0 10.39555 63.43436
4 63.433258 10.394215 2019-09-30-04:48:35.683 11.0 10.39505 63.43427
5 63.431080 10.357251 NaT NaN 10.35826 63.43149
6 63.431140 10.357081 NaT NaN 10.35809 63.43155
7 63.431220 10.356711 NaT NaN 10.35772 63.43163
8 63.431410 10.355451 NaT NaN 10.35646 63.43182
9 63.431550 10.354351 NaT NaN 10.35536 63.43196
coordinates_diff_Lat coordinates_diff_Long
0 -0.000004 -0.000222
1 0.000041 -0.001464
2 0.000029 -0.001217
3 -0.001102 -0.001306
4 -0.001012 -0.000835
5 -0.000410 -0.001009
6 -0.000410 -0.001009
7 -0.000410 -0.001009
8 -0.000410 -0.001009
9 -0.000410 -0.001009