我有一个小的 pandas 数据框,其中包含只有几行和三列的数据:
import pandas as pd
df_size = pd.DataFrame([[0.510,0.450,0.540],
[0.899,0.820,1.150],
[1.745,1.587,2.020],
[2.020,1.745,2.405],
], columns=['diameter_mean', 'diameter_min','diameter_max'])
第二个数据帧包含一个(更长的)查找表:
df_lookup = pd.DataFrame([[0.450,0.021548],
[0.510,0.021791],
[0.540,0.022038],
[0.565,0.022289],
[0.695,0.022545],
[0.720,0.034321],
[0.770,1.292340],
[0.820,1.296070],
[0.899,1.302340],
[1.150,2.311770],
[1.361,3.325140],
[1.587,4.144621],
[1.745,3.498933],
[2.020,3.512665],
[2.405,3.610773],
], columns=['diameter', 'SMS'])
意味着,对于任何查找表条目来说,df_size 中可能存在一个数据点。
基于
df_lookup['diameter']
我想自动查找 df_size['diameter_mean']
、df_size['diameter_min']
和 df_size['diameter_max']
所有三列的相应 SMS 值,并将找到的值附加为三个新列 ['SMS']、['SMS_min '], ['SMS_max'] 到数据集数据框'df_size'。
我尝试通过合并创建三个新列,但这会导致 - 正如预期的 - 值错误:
df_size['SMS'] = df_size.merge(df_lookup, left_on='diameter_mean', right_on='diameter')
df_size['SMS_min'] = df_size.merge(df_lookup, left_on='diameter_min', right_on='diameter')
df_size['SMS_max'] = df_size.merge(df_lookup, left_on='diameter_max', right_on='diameter')
由于多列,所有三行代码都将设置为一列。
或者,我尝试了使用
apply
和 map
的解决方案,但似乎我错过了一些东西(这里仅针对直径_平均列的示例):
df_size['SMS'].apply(lambda df_lookup.SMS: df_lookup['diameter'][(df_size['diameter_mean'])].values[0])
导致关键错误。
目标 df_size 看起来像:
df_size
'diameter_mean' 'diameter_min' 'diameter_max' 'SMS' 'SMS_min' 'SMS_max'
0.510 0.450 0.540 0.021791 0.021548 0.022038
0.899 0.820 1.150 1.302340 1.296070 2.311770
1.745 1.587 2.020 3.498933 4.144621 3.512665
2.020 1.745 2.405 3.512665 3.498933 3.610773
顺便问一下,两个数据帧在查找参数(=直径)方面是否有必要具有强单调行为?
merge_asof
:
tmp = df_size.reset_index()
merges = {'SMS': ('diameter_mean', 'nearest'),
'SMS_min': ('diameter_min', 'forward'),
'SMS_max': ('diameter_max', 'backward'),
}
for k, (c, d) in merges.items():
df_size[k] = pd.merge_asof(
tmp.sort_values(by=c), df_lookup,
left_on=c, right_on='diameter',
direction=d
).set_index('index')['SMS']
输出:
diameter_mean diameter_min diameter_max SMS SMS_min SMS_max
0 0.510 0.450 0.540 0.021791 0.021548 0.022038
1 0.899 0.820 1.150 1.302340 1.296070 2.311770
2 1.745 1.587 2.020 3.498933 4.144621 3.512665
3 2.020 1.745 2.405 3.512665 3.498933 3.610773