我有一些多索引 df,其中有一个月,然后是每个设施的设施 ID 和 TotalSpend 值。我正在尝试汇总一个季度所有设施的 TotalSpend,其中包含该季度所有 3 个月以及上一年该季度所有 3 个月的数据。
在我的示例数据中,我尝试从 df 获取四月、五月和六月的子集,然后进行内部联接,但是当我尝试这样做时,我得到一个错误,它不是 df,而是使用 df 的 df。 loc[[日期]] 正在给我。 我基本上想检查该季度所有 3 个月中显示的设施 ID,并只保留这些值。
所需输出:
期望的产出将是拥有 2024 年第二季度所有 3 个月数据的所有设施在 2024 年第二季度的支出总和,然后是所有这些设施在 2023 年第二季度的支出总和。 在本例中,仅是设施 1,因此 2024 年第二季度的总和为 450,2024 年第一季度的总和为 300。
代码:
import pandas as pd
import datetime
def open_file(path, quarter_number, months):
df_raw = pd.DataFrame({'Date':["2024-04-01","2024-05-01","2024-06-01", "2024-06-01","2024-05-01","2023-04-01","2023-05-01","2023-06-01","2024-05-01","2024-06-01","2023-05-01","2023-06-01", "2023-04-01","2024-05-01","2024-06-01"],
'FacilityID': [1,1,1,1,1,1,1,1,2,2,2,2,3,4,4],
'TotalSpend': [100,110,120,50,70,90,100,110,150,140,120,60,90,190,150]
}).set_index('Date')
df = df_raw.groupby(['Date', 'FacilityID'])['TotalSpend'].sum()
# print(df)
cur_dates = []
prev_dates = []
for month in months:
cur_date = datetime.date(2024, month, 1)
prev_date = datetime.date(cur_date.year - 1, month, 1)
cur_dates.append(cur_date.strftime('%Y-%m-%d'))
prev_dates.append(prev_date.strftime('%Y-%m-%d'))
#this is where i'm having issues
cur_data =df.loc[[cur_dates[1]]].join(df.loc[[cur_dates[1]]], on='FacilityID' ,join = "inner")
prev_data = df.loc[prev_dates[0]:prev_dates[-1]]
# print(cur_data)
# print(prev_data)
if __name__ == "__main__":
change = open_file("path",2 ,[4,5,6])
print(change)
我希望我正确理解了你的问题:
# `df_raw` is the same as in your question:
df_raw.index = pd.to_datetime(df_raw.index)
df_raw["month"] = df_raw.index.month
df_raw = df_raw.groupby(
[
pd.PeriodIndex(df_raw.index, freq="Q"),
"FacilityID",
]
).agg({"TotalSpend": "sum", "month": "nunique"})
df_raw = df_raw[df_raw.month == 3]
print(df_raw["TotalSpend"].reset_index())
打印:
Date FacilityID TotalSpend
0 2023Q2 1 300
1 2024Q2 1 450