过去一年我一直在跟踪我的游戏会话 - 只是为了获得我关心的数据并学习 python。 现在我想知道(并绘制——但还不重要)在整个时间和所有活动中,a 播放最多的时间(小时:0 到 23)——自跟踪开始以来每天。
样品:
session_id | 游戏编号 | 开始日期时间 | 结束日期时间 |
---|---|---|---|
001 | 74 | 2023-02-22 13:15:00 | 2023-02-22 15:30:00 |
002 | 127 | 2023-02-23 13:30:00 | 2023-02-23 13:45:00 |
003 | 74 | 2023-02-24 14:40:00 | 2023-02-24 15:00:00 |
最后我想看这个信息-不需要计算栏:
hour_of_day | sum_hours_played | avg_hours_played_per_day | 计算 |
---|---|---|---|
13 | 1.00 | 0.33 | (0.75 + 0.25) / 3 天 |
14 | 1.33 | 0.44 | (1.00 + 0.33) / 3 天 |
15 | 0.5 | 0.17 | (0.5) / 3 天 |
简而言之,我不只是想看我玩了几个小时(玩过:1,没玩过0),还想知道我玩了特定小时的比例。
我在网上看到了一些方法,但几乎所有方法都只是每月或每天对单个是或否事件进行计数/求和。他们不计算一天/小时的比例。
所以,我很高兴你有任何提示。
设置:
import pandas as pd
# Load your data into a DataFrame
data = {
'session_id': [1, 2, 3],
'game_id': [74, 127, 74],
'start_datetime': ['2023-02-22 13:15:00', '2023-02-23 13:30:00', '2023-02-24 14:40:00'],
'end_datetime': ['2023-02-22 15:30:00', '2023-02-23 13:45:00', '2023-02-24 15:00:00']
}
df = pd.DataFrame(data)
# Convert the 'start_datetime' and 'end_datetime' columns to datetime objects
df['start_datetime'] = pd.to_datetime(df['start_datetime'])
df['end_datetime'] = pd.to_datetime(df['end_datetime'])
# Calculate the duration of each gaming session
df['duration'] = df['end_datetime'] - df['start_datetime']
# Initialize an empty dictionary to store the hours played
hours_played = {i: 0 for i in range(24)}
诀窍是将每个会话分成几个小时:
# Break down each session into hours and sum the proportion of hours played
for _, row in df.iterrows():
start = row['start_datetime']
end = row['end_datetime']
duration = row['duration']
# Loop over the hours involved
while start < end:
# Calculate the end of the hour currently considered
hour_start = start.replace(minute=0, second=0)
hour_end = hour_start + pd.Timedelta(hours=1)
played = min(hour_end, end) - start # Here take what ends first (the hour or the session) and substract the start time
hours_played[start.hour] += played.total_seconds() / 3600 # Here add the time played to the current value in the dictionary
start = hour_end # For the (possible) next iteration of the while look, set the start to the end of the hour currently considered
# Calculate the average hours played per day
total_days = (df['end_datetime'].max() - df['start_datetime'].min()).days + 1
avg_hours_played = {hour: hours / total_days for hour, hours in hours_played.items()}
# Create a DataFrame to display the results
results = pd.DataFrame(list(avg_hours_played.items()), columns=['hour_of_day', 'avg_hours_played_per_day'])
results['sum_hours_played'] = [hours_played[hour] for hour in results['hour_of_day']]
results = results[['hour_of_day', 'sum_hours_played', 'avg_hours_played_per_day']]
print(results)
我希望我的评论是可以理解的