每个季度与 Pandas 的支出

问题描述 投票:0回答:1

我有一些按日期和 FacilityID 分组后的交易数据,分组后如下所示。我正在尝试计算季度环比变化,即所有设施在本季度所有 3 个月和上一年季度 3 个月的支出中的总支出之和。因此,在此示例中,我只需要 2024 年 4 月至 6 月设施 #1 的支出总和超过 2023 年 4 月至 6 月设施 #1 的总支出即可获得零钱。应排除设施 2,因为它在 2023 年 4 月或 2024 年没有任何支出。

pytho

这是我迄今为止尝试过的代码,但它也在代码中包含了设施 2,而它应该被排除,因为它没有 2024 年 4 月和 2023 年 4 月的任何数据。

import pandas as pd
import datetime

def open_file(path, quarter_number, months):
    df_raw = pd.DataFrame({'Date':["2024-04-01","2024-05-01","2024-06-01", "2024-06-01","2024-05-01","2023-04-01","2023-05-01","2023-06-01","2024-05-01","2024-06-01","2023-05-01","2023-06-01", "2023-04-01","2024-05-01","2024-06-01"],
                         'FacilityID': [1,1,1,1,1,1,1,1,2,2,2,2,3,4,4],
                         'TotalSpend': [100,110,120,50,70,90,100,110,150,140,120,60,90,190,150]
    }).set_index('Date')
    df = df_raw.groupby(['Date', 'FacilityID'])['TotalSpend'].sum()
    print(df)

    cur_dates = []
    prev_dates = []

    for month in months:
        cur_date = datetime.date(2024, month, 1)
        prev_date = datetime.date(cur_date.year - 1, month, 1)
        cur_dates.append(cur_date.strftime('%Y-%m-%d'))
        prev_dates.append(prev_date.strftime('%Y-%m-%d'))

    cur_quarter_data = pd.concat(
        [df.loc[date] if date in df.index.levels[0] else pd.Series(dtype='float64') for date in cur_dates])

    prev_quarter_data = pd.concat(
        [df.loc[date] if date in df.index.levels[0] else pd.Series(dtype='float64') for date in prev_dates])

    common_facilities = cur_quarter_data.index.intersection(prev_quarter_data.index)


    cur_quarter_vals = cur_quarter_data.loc[common_facilities]
    prev_quarter_vals = prev_quarter_data.loc[common_facilities]

    yoy_change = (cur_quarter_vals.sum() - prev_quarter_vals.sum()) / prev_quarter_vals.sum() * 100
    return yoy_change

if __name__ == "__main__":
    change = open_file("path",2 ,[4,5,6])
    print(change)
python pandas
1个回答
0
投票

示例代码

import pandas as pd
df = pd.DataFrame({'Date':["2024-04-01","2024-05-01","2024-06-01", "2024-06-01","2024-05-01","2023-04-01","2023-05-01","2023-06-01","2024-05-01","2024-06-01","2023-05-01","2023-06-01", "2023-04-01","2024-05-01","2024-06-01"], 'FacilityID': [1,1,1,1,1,1,1,1,2,2,2,2,3,4,4], 'TotalSpend': [100,110,120,50,70,90,100,110,150,140,120,60,90,190,150]})

df

          Date  FacilityID  TotalSpend
0   2024-04-01           1         100
1   2024-05-01           1         110
2   2024-06-01           1         120
3   2024-06-01           1          50  <-- duplicated date
4   2024-05-01           1          70  <-- duplicated date
5   2023-04-01           1          90
6   2023-05-01           1         100
7   2023-06-01           1         110
8   2024-05-01           2         150
9   2024-06-01           2         140
10  2023-05-01           2         120
11  2023-06-01           2          60
12  2023-04-01           3          90
13  2024-05-01           4         190
14  2024-06-01           4         150

您的样本有重复的日期。我认为这是您的意图,我将继续将它们结合起来以获得结果。

代码

# Convert 'Date' column to datetime
df['Date'] = pd.to_datetime(df['Date'])

# groupby & resample 2times 
tmp = (df.groupby('FacilityID')
         .resample('MS', on='Date')['TotalSpend']
         .sum(min_count=1)
         .reset_index()
         .groupby('FacilityID')
         .resample('QS', on='Date')['TotalSpend']
         .agg(['sum', 'count'])
)

# Shift the index by 12 months and reset the index
tmp_prev = (tmp1.reset_index(level=0)
                .shift(freq='12MS')
                .reset_index()
)

# Merge the current and previous periods data, keeping only rows count == 3
out = (
    tmp.merge(tmp_prev, on=['Date', 'FacilityID'], how='left', suffixes=['_cur', '_prev'])
    [lambda x: x.pop('count_cur').eq(3) & x.pop('count_prev').eq(3)]
)

输出:

        Date  FacilityID  sum_cur  sum_prev
4 2024-04-01           1    450.0     300.0
© www.soinside.com 2019 - 2024. All rights reserved.