我有一个包含原始数据的 Excel 文件,其中按月列出了设施的交易数据。有月份/年份、设施 ID,然后是该交易的支出。一个设施在一个月内可以有多次交易。我已设法按日期和设施 ID 对交易进行分组,并将总支出作为值。看起来像这样。
我正在尝试计算 TotalSpend 的总体同比变化,例如 2024 年 5 月 1 日(2024 年 5 月)。有些设施可能是新的,因此他们没有 2023 年 5 月的数据,或者他们可能已经退出或尚未报告数据,因此他们没有 2024 年 5 月的数据。在这两种情况下,我都想从计算中排除这些设施。我还希望有一个季度同比变化,但我假设我可以做与 5 月份相同的事情,但在数据可用时对 4 月、5 月和 6 月进行同样的操作。
这是我到目前为止所尝试过的,但出现错误。我不一定需要将其添加到 df 的列中,只需
May 2024 5%
足以满足我的目的
代码 - 如果有意义的话,使用一些伪代码
import pandas as pd
import datetime
def open_file(path, date_str, prev_date_str):
df_raw = pd.DataFrame({'Date':["2024-05-01","2024-05-01","2024-05-01","2023-05-01","2024-05-01","2023-05-01","2023-05-01","2024-04-01","2022-05-01"],
'FacilityID': [6,6,5,5,1,6,6,4,6],
'TotalSpend': [100,200,5,5,90,190,150,500,200]
})
df = df_raw.groupby(['Date','FacilityID'])['TotalSpend'].sum()
#facilities = get complete list of facilities
cur_month_vals = []
prev_month_vals = []
for facility in facilities:
if df.loc[date_str][facility] and if df.loc[prev_date_str]:
cur_month_vals.append(df.loc[date_str][facility].value)
prev_month_vals.append(df.loc[prev_date_str][facility].value)
if __name__ == "__main__":
df = open_file('MedMiner_Model - EW - MidJun2024_Send.xlsx', '2024-05-01', prev_date_str= '2023-05-01')
要计算 2024 年 5 月 TotalSpend 的同比变化,您需要确保正确处理当前和上一年 5 月数据中存在的设施。
import pandas as pd
def open_file(path, date_str, prev_date_str):
df_raw = pd.read_excel(path, "Data")
df = df_raw.groupby(['Date', 'FacilityID'])['TotalSpend'].sum()
# Extract the current and previous month data
cur_month_data = df.loc[date_str] if date_str in df.index.levels[0] else pd.Series(dtype='float64')
prev_month_data = df.loc[prev_date_str] if prev_date_str in df.index.levels[0] else pd.Series(dtype='float64')
# Find common facilities in both months
common_facilities = cur_month_data.index.intersection(prev_month_data.index)
# Filter data to include only common facilities
cur_month_vals = cur_month_data.loc[common_facilities]
prev_month_vals = prev_month_data.loc[common_facilities]
# Calculate year-over-year change
yoy_change = (cur_month_vals.sum() - prev_month_vals.sum()) / prev_month_vals.sum() * 100
return yoy_change
if __name__ == "__main__":
path = '/mnt/data/MedMiner_Model - EW - MidJun2024_Send.xlsx'
yoy_change = open_file(path, '2024-05-01', '2023-05-01')
print(f'May 2024 Year-Over-Year Change: {yoy_change:.2f}%')
对于季度计算,您可以扩展该方法以考虑每个期间(当前和上一年)的三个月。
def open_file_quarterly(path, cur_dates, prev_dates):
df_raw = pd.read_excel(path, "Data")
df = df_raw.groupby(['Date', 'FacilityID'])['TotalSpend'].sum()
cur_quarter_data = pd.concat([df.loc[date] if date in df.index.levels[0] else pd.Series(dtype='float64') for date in cur_dates])
prev_quarter_data = pd.concat([df.loc[date] if date in df.index.levels[0] else pd.Series(dtype='float64') for date in prev_dates])
common_facilities = cur_quarter_data.index.intersection(prev_quarter_data.index)
cur_quarter_vals = cur_quarter_data.loc[common_facilities]
prev_quarter_vals = prev_quarter_data.loc[common_facilities]
yoy_change = (cur_quarter_vals.sum() - prev_quarter_vals.sum()) / prev_quarter_vals.sum() * 100
return yoy_change
if __name__ == "__main__":
cur_quarter_dates = ['2024-04-01', '2024-05-01', '2024-06-01']
prev_quarter_dates = ['2023-04-01', '2023-05-01', '2023-06-01']
yoy_change_q2 = open_file_quarterly(path, cur_quarter_dates, prev_quarter_dates)
print(f'Q2 2024 Year-Over-Year Change: {yoy_change_q2:.2f}%')