从 Excel 计算 Pandas 中一个月的同比变化

问题描述 投票:0回答:1

我有一个包含原始数据的 Excel 文件,其中按月列出了设施的交易数据。有月份/年份、设施 ID,然后是该交易的支出。一个设施在一个月内可以有多次交易。我已设法按日期和设施 ID 对交易进行分组,并将总支出作为值。看起来像这样。

enter image description here

我正在尝试计算 TotalSpend 的总体同比变化,例如 2024 年 5 月 1 日(2024 年 5 月)。有些设施可能是新的,因此他们没有 2023 年 5 月的数据,或者他们可能已经退出或尚未报告数据,因此他们没有 2024 年 5 月的数据。在这两种情况下,我都想从计算中排除这些设施。我还希望有一个季度同比变化,但我假设我可以做与 5 月份相同的事情,但在数据可用时对 4 月、5 月和 6 月进行同样的操作。

这是我到目前为止所尝试过的,但出现错误。我不一定需要将其添加到 df 的列中,只需

May 2024      5%

足以满足我的目的

代码 - 如果有意义的话,使用一些伪代码

import pandas as pd
import datetime

def open_file(path, date_str, prev_date_str):
    df_raw = pd.DataFrame({'Date':["2024-05-01","2024-05-01","2024-05-01","2023-05-01","2024-05-01","2023-05-01","2023-05-01","2024-04-01","2022-05-01"],
                     'FacilityID': [6,6,5,5,1,6,6,4,6],
                     'TotalSpend': [100,200,5,5,90,190,150,500,200]
})
    df = df_raw.groupby(['Date','FacilityID'])['TotalSpend'].sum()
    #facilities = get complete list of facilities
    cur_month_vals = []
    prev_month_vals = []

    for facility in facilities:
        if df.loc[date_str][facility] and if df.loc[prev_date_str]:
            cur_month_vals.append(df.loc[date_str][facility].value)
            prev_month_vals.append(df.loc[prev_date_str][facility].value)

if __name__ == "__main__":
    df = open_file('MedMiner_Model - EW - MidJun2024_Send.xlsx', '2024-05-01', prev_date_str= '2023-05-01')
python pandas
1个回答
0
投票

要计算 2024 年 5 月 TotalSpend 的同比变化,您需要确保正确处理当前和上一年 5 月数据中存在的设施。

import pandas as pd

def open_file(path, date_str, prev_date_str):
    df_raw = pd.read_excel(path, "Data")
    df = df_raw.groupby(['Date', 'FacilityID'])['TotalSpend'].sum()

    # Extract the current and previous month data
    cur_month_data = df.loc[date_str] if date_str in df.index.levels[0] else pd.Series(dtype='float64')
    prev_month_data = df.loc[prev_date_str] if prev_date_str in df.index.levels[0] else pd.Series(dtype='float64')

    # Find common facilities in both months
    common_facilities = cur_month_data.index.intersection(prev_month_data.index)

    # Filter data to include only common facilities
    cur_month_vals = cur_month_data.loc[common_facilities]
    prev_month_vals = prev_month_data.loc[common_facilities]

    # Calculate year-over-year change
    yoy_change = (cur_month_vals.sum() - prev_month_vals.sum()) / prev_month_vals.sum() * 100
    return yoy_change

if __name__ == "__main__":
    path = '/mnt/data/MedMiner_Model - EW - MidJun2024_Send.xlsx'
    yoy_change = open_file(path, '2024-05-01', '2023-05-01')
    print(f'May 2024 Year-Over-Year Change: {yoy_change:.2f}%')

对于季度计算,您可以扩展该方法以考虑每个期间(当前和上一年)的三个月。

def open_file_quarterly(path, cur_dates, prev_dates):
    df_raw = pd.read_excel(path, "Data")
    df = df_raw.groupby(['Date', 'FacilityID'])['TotalSpend'].sum()

    cur_quarter_data = pd.concat([df.loc[date] if date in df.index.levels[0] else pd.Series(dtype='float64') for date in cur_dates])
    prev_quarter_data = pd.concat([df.loc[date] if date in df.index.levels[0] else pd.Series(dtype='float64') for date in prev_dates])

    common_facilities = cur_quarter_data.index.intersection(prev_quarter_data.index)

    cur_quarter_vals = cur_quarter_data.loc[common_facilities]
    prev_quarter_vals = prev_quarter_data.loc[common_facilities]

    yoy_change = (cur_quarter_vals.sum() - prev_quarter_vals.sum()) / prev_quarter_vals.sum() * 100
    return yoy_change

if __name__ == "__main__":
    cur_quarter_dates = ['2024-04-01', '2024-05-01', '2024-06-01']
    prev_quarter_dates = ['2023-04-01', '2023-05-01', '2023-06-01']
    yoy_change_q2 = open_file_quarterly(path, cur_quarter_dates, prev_quarter_dates)
    print(f'Q2 2024 Year-Over-Year Change: {yoy_change_q2:.2f}%')
© www.soinside.com 2019 - 2024. All rights reserved.