我需要为特定的 API 编写一个嵌套的 for 循环。我首先创建一个日历数据框来填充 API 参数和输出数据框中的日期列。
日历的代码似乎有效。 Ms 列是 API 所需的毫秒。我已经将它们变成了下面 for 循环中的函数。
calendar = pd.date_range(start=(datetime.today() - timedelta(days=6)).date(), end=datetime.today().date(), freq='d')
calendar = pd.DataFrame(calendar)
calendar.rename(columns={0:'Date'}, inplace=True)
calendar['startMs'] = calendar.apply(lambda x: int(round(datetime.combine(x['Date'], datetime.min.time()).astimezone(timezone.utc).timestamp() * 1000)), 1)
calendar['endMs'] = calendar.apply(lambda x: int(round(datetime.combine(x['Date'], datetime.max.time()).astimezone(timezone.utc).timestamp() * 1000)), 1)
日期 | 开始女士 | 结束女士 |
---|---|---|
2024-05-29 | 1716955200000 | 1717041600000 |
2024-05-30 | 1717041600000 | 1717128000000 |
2024-05-31 | 1717128000000 | 1717214400000 |
2024-06-01 | 1717214400000 | 1717300800000 |
2024-06-02 | 1717300800000 | 1717387200000 |
2024-06-03 | 1717387200000 | 1717473600000 |
2024-06-04 | 1717473600000 | 1717560000000 |
这是我尝试过的循环,但没有任何效果。
time.sleep(1)
是为了防止 API 超出调用限制(我们的速率为每秒 5 次调用)。目标是为每个驱动程序提取 7 天的数据,然后将其 ETL 到服务器。如果我只拉动一天,驱动程序循环就会起作用,而当我尝试多天时,它就会失败。为了测试目的,我只将驱动程序表设置为 20 条记录,并且输出始终为 20 条记录。我预计有 140 条记录(20 名驾驶员 x 7 天)。今天是 6/4,我收到的输出显示日期列为 6/3(仅当有帮助时)
for i in range(0,6):
date = calendar['Date'][i].date()
startMs = calendar['startMs'][i]
endMs = calendar['endMs'][i]
df = pd.DataFrame()
for driverId in drivers['id']:
response = requests.request('GET', url, headers=headers).json()
url = f'https://api.samsara.com/v1/fleet/drivers/{driverId}/safety/score?startMs={startMs}&endMs={endMs}'
df = df._append(response, ignore_index=True)
time.sleep(1)
df['Date'] = date
这是有关驱动程序数据帧的信息:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 20 non-null object
1 name 20 non-null object
2 username 20 non-null object
3 timezone 20 non-null object
dtypes: object(4)
memory usage: 772.0+ bytes
这个循环可以单独工作:
# Today
value = 0
date = calendar['Date'][value].date()
startMs = calendar['startMs'][value]
endMs = calendar['endMs'][value]
df0 = pd.DataFrame()
for driverId in drivers['id']:
response = requests.request('GET', url, headers=headers).json()
url = f'https://api.samsara.com/v1/fleet/drivers/{driverId}/safety/score?startMs={startMs}&endMs={endMs}'
df0 = df0._append(response, ignore_index=True)
time.sleep(1)
df0['Date'] = date
这部分代码也是如此:
for i in range(0,6):
date = calendar['Date'][i].date()
startMs = calendar['startMs'][i]
endMs = calendar['endMs'][i]
或者建议的方式:
for i in range(1):
row = calendar.iloc[i]
date = row['Date'].date()
startMs = row['startMs']
endMs = row['endMs']
我不知道如何让他们一起工作。
它工作时的响应输出看起来像这样,除了它有真实数据:
for driverId in drivers['id']:
response = requests.request('GET', url, headers=headers).json()
url = f'https://api.samsara.com/v1/fleet/drivers/{driverId}/safety/score?startMs={startMs}&endMs={endMs}'
pprint(response)
{'crashCount': 0,
'driverId': 1234,
'harshAccelCount': 0,
'harshBrakingCount': 0,
'harshEvents': [],
'harshTurningCount': 0,
'safetyScore': 100,
'safetyScoreRank': '1',
'timeOverSpeedLimitMs': 0,
'totalDistanceDrivenMeters': 0,
'totalHarshEventCount': 0,
'totalTimeDrivenMs': 0}
{'crashCount': 0,
'driverId': 1235,
'harshAccelCount': 0,
'harshBrakingCount': 0,
'harshEvents': [],
'harshTurningCount': 0,
'safetyScore': 100,
'safetyScoreRank': '1',
'timeOverSpeedLimitMs': 0,
'totalDistanceDrivenMeters': 0,
'totalHarshEventCount': 0,
'totalTimeDrivenMs': 0}
df._append
用于追加行,在您的情况下,字典中的每个键似乎都被视为一个新行。
更正确的方法是这样做:
list_df = []
for i_day in range(0,6):
row = calendar.iloc[i_day]
date = row['Date'].date()
startMs = row['startMs']
endMs = row['endMs']
for driverId in drivers['id']:
# get json from the API
resp = {'id': 22, 'name': 'John', 'username':'John', 'timezone':'utc'}
# serialize data in the dataframe
# `index = [0]` indicates that we have only single row
df1 = pd.DataFrame(resp, index=[0])
# append new rows to final dataframe
list_df.append(df1)
df = pd.concat(list_df)
print(df)