从 Pandas 获取多索引数据帧子集的交集

问题描述 投票:0回答:1

我有一些多索引 df,其中有一个月,然后是每个设施的设施 ID 和 TotalSpend 值。我正在尝试汇总一个季度所有设施的 TotalSpend,其中包含该季度所有 3 个月以及上一年该季度所有 3 个月的数据。

enter image description here

在我的示例数据中,我尝试从 df 获取四月、五月和六月的子集,然后进行内部联接,但是当我尝试这样做时,我得到一个错误,它不是 df,而是使用 df 的 df。 loc[[日期]] 正在给我。 我基本上想检查该季度所有 3 个月中显示的设施 ID,并只保留这些值。

所需输出:

期望的产出将是拥有 2024 年第二季度所有 3 个月数据的所有设施在 2024 年第二季度的支出总和,然后是所有这些设施在 2023 年第二季度的支出总和。 在本例中,仅是设施 1,因此 2024 年第二季度的总和为 450,2024 年第一季度的总和为 300。

enter image description here

代码:

import pandas as pd
import datetime

def open_file(path, quarter_number, months):
    df_raw = pd.DataFrame({'Date':["2024-04-01","2024-05-01","2024-06-01", "2024-06-01","2024-05-01","2023-04-01","2023-05-01","2023-06-01","2024-05-01","2024-06-01","2023-05-01","2023-06-01", "2023-04-01","2024-05-01","2024-06-01"],
                         'FacilityID': [1,1,1,1,1,1,1,1,2,2,2,2,3,4,4],
                         'TotalSpend': [100,110,120,50,70,90,100,110,150,140,120,60,90,190,150]
    }).set_index('Date')
    df = df_raw.groupby(['Date', 'FacilityID'])['TotalSpend'].sum()
    # print(df)

    cur_dates = []
    prev_dates = []

    for month in months:
        cur_date = datetime.date(2024, month, 1)
        prev_date = datetime.date(cur_date.year - 1, month, 1)
        cur_dates.append(cur_date.strftime('%Y-%m-%d'))
        prev_dates.append(prev_date.strftime('%Y-%m-%d'))

    #this is where i'm having issues
    cur_data =df.loc[[cur_dates[1]]].join(df.loc[[cur_dates[1]]], on='FacilityID' ,join = "inner")
    prev_data = df.loc[prev_dates[0]:prev_dates[-1]]

    # print(cur_data)
    # print(prev_data)

if __name__ == "__main__":
    change = open_file("path",2 ,[4,5,6])
    print(change)
python pandas dataframe
1个回答
0
投票

我希望我正确理解了你的问题:

# `df_raw` is the same as in your question:

df_raw.index = pd.to_datetime(df_raw.index)
df_raw["month"] = df_raw.index.month

df_raw = df_raw.groupby(
    [
        pd.PeriodIndex(df_raw.index, freq="Q"),
        "FacilityID",
    ]
).agg({"TotalSpend": "sum", "month": "nunique"})

df_raw = df_raw[df_raw.month == 3]

print(df_raw["TotalSpend"].reset_index())

打印:

     Date  FacilityID  TotalSpend
0  2023Q2           1         300
1  2024Q2           1         450
© www.soinside.com 2019 - 2024. All rights reserved.