Python 将分层树规范化为 pandas 数据框

问题描述 投票:0回答:1

我有以下数据框,它具有树状结构。层级范围从 2 到 4:

df = 
    Hierarchy ID  Value
0   1.0           T1
1   1.1           T2
2   1.1.1         T3 
3   1.1.2         T4
4   1.2           T5
5   1.2.1         T6
6   2.0           T7
7   2.1           T8
8   2.1.1         T9
9   2.1.1.1       T10
10  2.1.2         T11
...

我想将其标准化为可过滤的 pandas 数据框,如下所示。 “N.0”级别应始终为主级别 1,其余级别应遵循它。

df = 
    Level_1   Level_2  Level_3  Level_4
    T1        T2       T3       Nan
    T1        T2       T4       Nan
    T1        T5       T6       Nan
    T7        T8       T9       T10
    T7        T8       T11      Nan
...

我不知道如何解决这个问题,有什么帮助吗?

python pandas
1个回答
0
投票

这似乎可以解决问题 - 从数据帧到映射字典,然后从输出

print()
返回数据帧作为单独的练习。

这假设映射中存在所有“路径”。

mapping = {
    "1.0": "T1",
    "1.1": "T2",
    "1.1.1": "T3",
    "1.1.2": "T4",
    "1.2": "T5",
    "1.2.1": "T6",
    "2.0": "T7",
    "2.1": "T8",
    "2.1.1": "T9",
    "2.1.1.1": "T10",
    "2.1.2": "T11",
}


def key_to_tuple(key):
    bits = [int(p) for p in key.split(".")]
    while bits[-1] == 0:
        bits.pop()
    return tuple(bits)


mapping = {key_to_tuple(key): value for key, value in mapping.items()}

for key, value in mapping.items():
    # Figure out if this is a leaf node
    if not any(k for k in mapping if k[: len(key)] == key and k != key):
        # Map back to the original key
        path = [mapping.get(key[: x + 1]) for x in range(len(key))]
        print(path)

打印出来

['T1', 'T2', 'T3']
['T1', 'T2', 'T4']
['T1', 'T5', 'T6']
['T7', 'T8', 'T9', 'T10']
['T7', 'T8', 'T11']
© www.soinside.com 2019 - 2024. All rights reserved.