如何高效地将单索引DataFrame扩展为多索引DataFrame? (蟒蛇、熊猫)

问题描述 投票:0回答:1
import pandas as pd
concordance_region = pd.DataFrame(
    {
    "country 1": pd.Series([1, 0], index=["region a", "region b"]),
    "country 2": pd.Series([0, 1], index=["region a", "region b"]),
    "country 3": pd.Series([0, 1], index=["region a", "region b"]),
}
)
display(concordance_region)
country_index = concordance_region.columns
region_index = concordance_region.index
sector_index = ['sector a','sector b']
country_sector = pd.MultiIndex.from_product([country_index, sector_index], names=["country", "sector"])
region_sector = pd.MultiIndex.from_product([region_index, sector_index], names=["region", "sector"])
concordance_region_expanded = pd.DataFrame([[1,0,0,0,0,0],[0,1,0,0,0,0],[0,0,1,0,1,0],[0,0,0,1,0,1]], index=region_sector, columns=country_sector)
display(concordance_region_expanded)

我想实现上述扩展,而不需要对数字进行硬编码。

一个选项是:

concordance_region_extended = pd.DataFrame(index=region_sector, columns=country_sector)
for region in region_index:
    for sector_1 in sector_index:
        for country in country_index:
            for sector_2 in sector_index:
                if sector_1 == sector_2 and concordance_region.loc[region, country] == 1:
                    concordance_region_expanded.loc[(region, sector_1),(country, sector_2)] = 1
concordance_region_expanded = concordance_region_expanded.fillna(value=0).infer_objects(copy=False)
concordance_region_expanded

但我认为上面的代码既不高效也不优雅。

有什么办法可以解决以上问题吗?

python pandas dataframe expansion hardcode
1个回答
0
投票

逻辑总结:将多重索引转换为数据帧,交叉连接,映射 concordance_region 以获取整数,根据您的条件进行布尔检查并取消堆栈:

cc = country_sector.to_frame(index=False)
rr=region_sector.to_frame(index=False)
mapped=(concordance_region
        .stack(future_stack=True)
        .rename_axis(index=['region','country'])
        .rename('integers')
       )
(cc
.merge(rr,how='cross')
.set_index(['region','country'])
.join(mapped)
.assign(integers = lambda df: df.integers.where(df.sector_x.eq(df.sector_y)).fillna(0).astype(int))
.set_index(['sector_x','sector_y'], append=True)
.unstack(['country','sector_x'])
.droplevel(axis='columns', level=0)
)

country           country 1          country 2          country 3
sector_x           sector a sector b  sector a sector b  sector a sector b
region   sector_y
region a sector a         1        0         0        0         0        0
         sector b         0        1         0        0         0        0
region b sector a         0        0         1        0         1        0
         sector b         0        0         0        1         0        1
© www.soinside.com 2019 - 2024. All rights reserved.