我有以下数据框,它具有树状结构。层级范围从 2 到 4:
df =
Hierarchy ID Value
0 1.0 T1
1 1.1 T2
2 1.1.1 T3
3 1.1.2 T4
4 1.2 T5
5 1.2.1 T6
6 2.0 T7
7 2.1 T8
8 2.1.1 T9
9 2.1.1.1 T10
10 2.1.2 T11
...
我想将其标准化为可过滤的 pandas 数据框,如下所示。 “N.0”级别应始终为主级别 1,其余级别应遵循它。
df =
Level_1 Level_2 Level_3 Level_4
T1 T2 T3 Nan
T1 T2 T4 Nan
T1 T5 T6 Nan
T7 T8 T9 T10
T7 T8 T11 Nan
...
我不知道如何解决这个问题,有什么帮助吗?
这似乎可以解决问题 - 从数据帧到映射字典,然后从输出
print()
返回数据帧作为单独的练习。
这假设映射中存在所有“路径”。
mapping = {
"1.0": "T1",
"1.1": "T2",
"1.1.1": "T3",
"1.1.2": "T4",
"1.2": "T5",
"1.2.1": "T6",
"2.0": "T7",
"2.1": "T8",
"2.1.1": "T9",
"2.1.1.1": "T10",
"2.1.2": "T11",
}
def key_to_tuple(key):
bits = [int(p) for p in key.split(".")]
while bits[-1] == 0:
bits.pop()
return tuple(bits)
mapping = {key_to_tuple(key): value for key, value in mapping.items()}
for key, value in mapping.items():
# Figure out if this is a leaf node
if not any(k for k in mapping if k[: len(key)] == key and k != key):
# Map back to the original key
path = [mapping.get(key[: x + 1]) for x in range(len(key))]
print(path)
打印出来
['T1', 'T2', 'T3']
['T1', 'T2', 'T4']
['T1', 'T5', 'T6']
['T7', 'T8', 'T9', 'T10']
['T7', 'T8', 'T11']