如何在大熊猫中填写丢失的GPS数据?

问题描述 投票:1回答:2

我有一个看起来像这样的数据框

+-----+------------+-------------+-------------------------+----+----------+----------+
|     | Actual_Lat | Actual_Long |          Time           | ID | Cal_long | Cal_lat  |
+-----+------------+-------------+-------------------------+----+----------+----------+
|   0 | 63.433376  | 10.397068   | 2019-09-30 04:48:13.540 | 11 | 10.39729 | 63.43338 |
|   1 | 63.433301  | 10.395846   | 2019-09-30 04:48:18.470 | 11 | 10.39731 | 63.43326 |
|   2 | 63.433259  | 10.394543   | 2019-09-30 04:48:23.450 | 11 | 10.39576 | 63.43323 |
|   3 | 63.433258  | 10.394244   | 2019-09-30 04:48:29.500 | 11 | 10.39555 | 63.43436 |
|   4 | 63.433258  | 10.394215   | 2019-09-30 04:48:35.683 | 11 | 10.39505 | 63.43427 |
| ... | ...        | ...         | ...                     | ...|      ... |      ... |
|  70 | NaN        | NaN         | NaT                     | NaN| 10.35826 | 63.43149 |
|  71 | NaN        | NaN         | NaT                     | NaN| 10.35809 | 63.43155 |
|  72 | NaN        | NaN         | NaT                     | NaN| 10.35772 | 63.43163 |
|  73 | NaN        | NaN         | NaT                     | NaN| 10.35646 | 63.43182 |
|  74 | NaN        | NaN         | NaT                     | NaN| 10.35536 | 63.43196 |
+-----+------------+-------------+-------------------------+----------+----------+----------+

Actual_latActual_long包含从GPS设备获得的数据的GPS坐标。 Cal_latcal_lat是从OSRM's API获得的GPS坐标。如您所见,实际坐标中缺少很多数据。我正在寻找一个数据集,以便当我取Actual_lat与cal_lat之差时,它应为零或至少接近零。我试图用目标纬度和经度填充这些缺失的值,但这将导致巨大的差异。我的问题是如何使用python / pandas填充这些值,以便在车辆遵循OSRM估算路径时,实际经纬度与经纬度估算值之间的差应为零或接近零。我不熟悉GIS数据集,也不知道如何处理它们。

EDIT:我正在寻找类似的东西。


+-----+------------+-------------+-------------------------+----------+----------+----------+----------------------+----------------------+
|     | Actual_Lat | Actual_Long |          Time           | Tour ID  | Cal_long | Cal_lat  | coordinates_diff_Lat | coordinates_diff_Lon |
+-----+------------+-------------+-------------------------+----------+----------+----------+----------------------+----------------------+
|   0 |  63.433376 |   10.397068 | 2019-09-30 04:48:13.540 |       11 | 10.39729 | 63.43338 |               -0.000 |               -0.000 |
|   1 |  63.433301 |   10.395846 | 2019-09-30 04:48:18.470 |       11 | 10.39731 | 63.43326 |                0.000 |               -0.001 |
|   2 |  63.433259 |   10.394543 | 2019-09-30 04:48:23.450 |       11 | 10.39576 | 63.43323 |                0.000 |               -0.001 |
|   3 |  63.433258 |   10.394244 | 2019-09-30 04:48:29.500 |       11 | 10.39555 | 63.43436 |               -0.001 |               -0.001 |
|   4 |  63.433258 |   10.394215 | 2019-09-30 04:48:35.683 |       11 | 10.39505 | 63.43427 |               -0.001 |               -0.001 |
| ... |        ... |         ... | ...                     |      ... |      ... |      ... |                  ... |                  ... |
|  70 |   63.43000 |    10.35800 | NaT                     | 115268.0 | 10.35826 | 63.43149 |                0.000 |               -0.003 |
|  71 |   63.43025 |    10.35888 | NaT                     | 115268.0 | 10.35809 | 63.43155 |                0.000 |               -0.003 |
|  72 |   63.43052 |    10.35713 | NaT                     | 115268.0 | 10.35772 | 63.43163 |                0.000 |               -0.002 |
|  73 |   63.43159 |    10.35633 | NaT                     | 115268.0 | 10.35646 | 63.43182 |                0.000 |               -0.001 |
|  74 |   63.43197 |    10.35537 | NaT                     | 115268.0 | 10.35536 | 63.43196 |                0.000 |                0.000 |
+-----+------------+-------------+-------------------------+----------+----------+----------+----------------------+----------------------+

请注意,63.43197,10.35537是目的地,63.433376,10.397068是起始位置。所有这些点都代表道路坐标。

python pandas gps gis
2个回答
0
投票

您需要pandas.DataFrame.where

假设您的数据帧为df,那么您可以这样做:

df.Actual_Lat = df.Actual_Lat.where(~df.Actual_Lat.isna(), df.Cal_lat)


0
投票

我认为您可以对Actual_LatCal_latActual_LongCal_long之间的差取平均,并使用这些平均值来估计与NaN值相对应的当前值。使用平均值,您可以看到平均误差有多少,并将其推断为其余数据


我将用来帮助您:

df

   Actual_Lat  Actual_Long                     Time    ID  Cal_long   Cal_lat
0   63.433376    10.397068  2019-09-30-04:48:13.540  11.0  10.39729  63.43338
1   63.433301    10.395846  2019-09-30-04:48:18.470  11.0  10.39731  63.43326
2   63.433259    10.394543  2019-09-30-04:48:23.450  11.0  10.39576  63.43323
3   63.433258    10.394244  2019-09-30-04:48:29.500  11.0  10.39555  63.43436
4   63.433258    10.394215  2019-09-30-04:48:35.683  11.0  10.39505  63.43427
5         NaN          NaN                      NaT   NaN  10.35826  63.43149
6         NaN          NaN                      NaT   NaN  10.35809  63.43155
7         NaN          NaN                      NaT   NaN  10.35772  63.43163
8         NaN          NaN                      NaT   NaN  10.35646  63.43182
9         NaN          NaN                      NaT   NaN  10.35536  63.43196

您可以复制此数据框以使用pd.read_clipboard检查结果

1计算差异

df['coordinates_diff_Lat']=df['Actual_Lat']-df['Cal_lat']
df['coordinates_diff_Long']=df['Actual_Long']-df['Cal_long']
print(df)

   Actual_Lat  Actual_Long                     Time    ID  Cal_long   Cal_lat  \
0   63.433376    10.397068  2019-09-30-04:48:13.540  11.0  10.39729  63.43338   
1   63.433301    10.395846  2019-09-30-04:48:18.470  11.0  10.39731  63.43326   
2   63.433259    10.394543  2019-09-30-04:48:23.450  11.0  10.39576  63.43323   
3   63.433258    10.394244  2019-09-30-04:48:29.500  11.0  10.39555  63.43436   
4   63.433258    10.394215  2019-09-30-04:48:35.683  11.0  10.39505  63.43427   
5         NaN          NaN                      NaT   NaN  10.35826  63.43149   
6         NaN          NaN                      NaT   NaN  10.35809  63.43155   
7         NaN          NaN                      NaT   NaN  10.35772  63.43163   
8         NaN          NaN                      NaT   NaN  10.35646  63.43182   
9         NaN          NaN                      NaT   NaN  10.35536  63.43196   

   coordinates_diff_Lat  coordinates_diff_Long  
0             -0.000004              -0.000222  
1              0.000041              -0.001464  
2              0.000029              -0.001217  
3             -0.001102              -0.001306  
4             -0.001012              -0.000835  
5                   NaN                    NaN  
6                   NaN                    NaN  
7                   NaN                    NaN  
8                   NaN                    NaN  
9                   NaN                    NaN 

2。计算这些差异的平均值。要忽略NaN值,请使用掩码:

mask=df['coordinates_diff_Lat'].notna()
mean_lat=df.loc[mask,'coordinates_diff_Lat'].mean()
mean_long=df.loc[mask,'coordinates_diff_Long'].mean()

3。用这个平均值估计NaN值并用fillna填充它们:

df['Actual_Lat'].fillna(df['Cal_lat']+mean_lat,inplace=True)
df['Actual_Long'].fillna(df['Cal_long']+mean_long,inplace=True)

4。更新差异列并显示新的数据框:

df['coordinates_diff_Lat']=df['Actual_Lat']-df['Cal_lat']
df['coordinates_diff_Long']=df['Actual_Long']-df['Cal_long']
print(df)

  Actual_Lat  Actual_Long                     Time    ID  Cal_long   Cal_lat  \
0   63.433376    10.397068  2019-09-30-04:48:13.540  11.0  10.39729  63.43338   
1   63.433301    10.395846  2019-09-30-04:48:18.470  11.0  10.39731  63.43326   
2   63.433259    10.394543  2019-09-30-04:48:23.450  11.0  10.39576  63.43323   
3   63.433258    10.394244  2019-09-30-04:48:29.500  11.0  10.39555  63.43436   
4   63.433258    10.394215  2019-09-30-04:48:35.683  11.0  10.39505  63.43427   
5   63.431080    10.357251                      NaT   NaN  10.35826  63.43149   
6   63.431140    10.357081                      NaT   NaN  10.35809  63.43155   
7   63.431220    10.356711                      NaT   NaN  10.35772  63.43163   
8   63.431410    10.355451                      NaT   NaN  10.35646  63.43182   
9   63.431550    10.354351                      NaT   NaN  10.35536  63.43196   

   coordinates_diff_Lat  coordinates_diff_Long  
0             -0.000004              -0.000222  
1              0.000041              -0.001464  
2              0.000029              -0.001217  
3             -0.001102              -0.001306  
4             -0.001012              -0.000835  
5             -0.000410              -0.001009  
6             -0.000410              -0.001009  
7             -0.000410              -0.001009  
8             -0.000410              -0.001009  
9             -0.000410              -0.001009  
© www.soinside.com 2019 - 2024. All rights reserved.