在 MultiIndex 中设置级别值

问题描述 投票:0回答:3

如何设置系列的级别值,可以使用字典替换值,还是仅使用与系列一样长的值列表?

这是一个示例数据框:

     sector from_country to_country           0
0  Textiles          FRA        AUS   47.502096
1  Textiles          FRA        USA  431.890710
2  Textiles          GBR        AUS   83.500590
3  Textiles          GBR        USA  324.836158
4      Wood          FRA        AUS   27.515607
5      Wood          FRA        USA  276.501148
6      Wood          GBR        AUS    1.406096
7      Wood          GBR        USA    8.996177

现在设置索引:

df = df.set_index(['sector', 'from_country', 'to_country']).squeeze()

例如,如果我想根据以下键/值对进行更改:

In [69]: replace_dict = {'FRA':'France', 'GBR':'UK'}
In [70]: new_vals = [replace_dict[x] for x in df.index.get_level_values('from_country')]

我希望输出看起来像:

In [68]: df.index.set_level_values(new_vals, level='from_country')
Out[68]: 
sector    from_country  to_country
Textiles  France        AUS            47.502096
                        USA           431.890710
          UK            AUS            83.500590
                        USA           324.836158
Wood      France        AUS            27.515607
                        USA           276.501148
          UK            AUS             1.406096
                        USA             8.996177

我目前正在这样做,但对我来说这似乎很愚蠢:

def set_index_values(df_or_series, new_values, level):
    """
    Replace the MultiIndex level `level` with `new_values`

    `new_values` must be the same length as `df_or_series`
    """
    levels = df_or_series.index.names
    retval = df_or_series.reset_index(level)
    retval[level] = new_values
    retval = retval.set_index(level, append=True).reorder_levels(levels).sortlevel().squeeze()
    return retval
python pandas
3个回答
13
投票

有点hacky,但你可以用

.index.set_levels
来做到这一点:

In [11]: df1.index.levels[1]
Out[11]: Index(['FRA', 'GBR'], dtype='object', name='from_country')

In [12]: df1.index.levels[1].map(replace_dict.get)
Out[12]: array(['France', 'UK'], dtype=object)

In [13]: df1.index = df1.index.set_levels(df1.index.levels[1].map(replace_dict.get), "from_country")

In [14]: df1
Out[14]:
sector    from_country  to_country
Textiles  France        AUS            47.502096
                        USA           431.890710
          UK            AUS            83.500590
                        USA           324.836158
Wood      France        AUS            27.515607
                        USA           276.501148
          UK            AUS             1.406096
                        USA             8.996177
Name: 0, dtype: float64

注意:有一种从名称中获取级别编号的方法,但我不记得了。


1
投票

添加到 Andy Hayden 的答案中,

df.set_index.levels
有参数
level
,我需要将其设置为代码运行所需的级别。


0
投票

我想我在某个地方偷了这个功能,但找不到哪里,所以我向原作者道歉。

您可以轻松地传递多重索引来更改其中的值、要更改的级别的名称以及新值。这些值必须与多重索引的长度相同。

def set_level_values(midx, level, values):
    """
    Replace pandas df multiindex level values with an iterable of values of the same length.

    Does allow duplicate values, which set_level_values method does not.
    
    Parameters
    ----------
    midx: pd.Multiindex
        Multilevel index or columns of pandas dataframe to change level in.
    level: str
        Name of level to change
    values: iterable
        Values to replace the original level values.
        
    Returns: pd.Multiindes
        The multivel index/columns with replaced values in given level.
    """
    full_levels = list(zip(*midx.values))
    names = midx.names
    if isinstance(level, str):
        if level not in names:
            raise ValueError(f'No level {level} in MultiIndex')
        level = names.index(level)
    if len(full_levels[level]) != len(values):
        raise ValueError('Values must be of the same size as original level')
    full_levels[level] = values
    return pd.MultiIndex.from_arrays(full_levels, names=names)
© www.soinside.com 2019 - 2024. All rights reserved.