如何拆分包含多个键：值对的列？

Question

我有一个数据集，其中包含某些车辆的数据。在此数据集中，最后一列对应于每辆车的路线信息（标题为“路线”）。

该列中的数据格式如下：

[{
    "longitude": 101, 
    "latitude": 3, 
    "timestamp": 2, 
    "accuracy": 0.5, 
    "hdop": 1
}]

每组数据代表一个坐标。另请注意，并非所有坐标都有“hdop”和“accuracy”键。

我想根据“经度”，“纬度”，“时间戳”，“准确性”，“hdop”分割最后一列中的数据。

请问我该怎么做有什么建议吗？我对 Python 还很陌生，所以我正在努力前进。

我尝试过以下命令：

import csv
import json
import ast
import pandas as pd

# Read data into a Pandas data frame
df = pd.read_csv('Trips Jan Week 1.csv')
df['route'] = pd.Series(df['route'],dtype="string")

# Extract the last column as a string
routes = df.iloc[:, -1].astype(str)

# Remove any leading or trailing characters before parsing the JSO
routes2 = routes.apply(lambda x: x.strip('[]').strip())
print(routes2)
print(type(routes2))

结果如下：

0       {"longitude":101.70393638087968,"latitude":3.1...
1       {"longitude":101.5249183,"latitude":3.0761391,...
2       {"longitude":101.70862944829587,"latitude":3.1...
3       {"longitude":101.705162,"latitude":3.1590512,...
4       {"longitude":101.71380749913092,"latitude":3.1...
                          ...                        
7928    {"longitude":101.7115741,"latitude":3.1961516,...
7929    {"longitude":101.71194960096184,"latitude":3.1... 
7930    {"longitude":101.71191491852223,"latitude":3.1...
7931    {"longitude":101.6748983,"latitude":3.1257983,...
7932    {"longitude":101.69488815920366,"latitude":3.0...
Name: route, Length: 7933, dtype: object
<class 'pandas.core.series.Series'>

因此，我希望获得的每个结果都将在下表中：

经度	纬度	时间戳	准确度	Hdop
101	3	2	0.5	1

Answer 1

我可以看到两种方法来实现此目的：要么将

pandas.Series

应用于最后一列中的唯一列表项，要么显式地将值分配给新列，通过索引和键对到达它们：

import pandas as pd

# preparing test data
route = [
    [{'longitude': 101, 'latitude': 3, 'timestamp': 2, 'accuracy': 0.5, 'hdop': 1}],
    [{'longitude': 102, 'latitude': 4, 'timestamp': 3, 'accuracy': 0.7, 'hdop': 2}],
    [{'longitude': 103, 'latitude': 5, 'timestamp': 4,                  'hdop': 3}],
    [{'longitude': 104, 'latitude': 6, 'timestamp': 5, 'accuracy': 0.1           }]
]

df = pd.DataFrame({'route': route})


# extract the route data by applying pandas.Series
route = df['route'].str[0].apply(pd.Series)
df = pd.concat([df, route], axis='columns')


# assign values by dictionary key
df.assign(
    longitude = df['route'].str[0].str['longitude']
    , latitude = df['route'].str[0].str['latitude']
    , timestamp = df['route'].str[0].str['timestamp']
    , accuracy = df['route'].str[0].str['accuracy']
    , hdop = df['route'].str[0].str['hdop']
)

在第二种情况下，您可以根据需要设置/更改列的顺序。但你需要提前知道他们所有的名字。当可能的键集未知时，第一种情况很有用。

如何拆分包含多个键：值对的列？

问题描述投票：0回答：1

1个回答

最新问题

如何拆分包含多个键：值对的列？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1