pandas 数据帧作为另一个不同长度的数据帧的查找表

问题描述 投票:0回答:1

我有一个小的 pandas 数据框,其中包含只有几行和三列的数据:

import pandas as pd
df_size = pd.DataFrame([[0.510,0.450,0.540],   
                        [0.899,0.820,1.150],   
                        [1.745,1.587,2.020],   
                        [2.020,1.745,2.405],   
                       ], columns=['diameter_mean', 'diameter_min','diameter_max'])

第二个数据帧包含一个(更长的)查找表:

df_lookup = pd.DataFrame([[0.450,0.021548],
                          [0.510,0.021791],
                          [0.540,0.022038],
                          [0.565,0.022289],
                          [0.695,0.022545],
                          [0.720,0.034321],
                          [0.770,1.292340],
                          [0.820,1.296070],
                          [0.899,1.302340],
                          [1.150,2.311770],
                          [1.361,3.325140],
                          [1.587,4.144621],
                          [1.745,3.498933],
                          [2.020,3.512665],
                          [2.405,3.610773],
                        ], columns=['diameter', 'SMS'])

意味着,对于任何查找表条目来说,df_size 中可能存在一个数据点。

基于

df_lookup['diameter']
我想自动查找
df_size['diameter_mean']
df_size['diameter_min']
df_size['diameter_max']
所有三列的相应 SMS 值,并将找到的值附加为三个新列 ['SMS']、['SMS_min '], ['SMS_max'] 到数据集数据框'df_size'。

我尝试通过合并创建三个新列,但这会导致 - 正如预期的 - 值错误:

df_size['SMS'] = df_size.merge(df_lookup, left_on='diameter_mean', right_on='diameter')
df_size['SMS_min'] = df_size.merge(df_lookup, left_on='diameter_min', right_on='diameter')
df_size['SMS_max'] = df_size.merge(df_lookup, left_on='diameter_max', right_on='diameter')

由于多列,所有三行代码都将设置为一列。

或者,我尝试了使用

apply
map
的解决方案,但似乎我错过了一些东西(这里仅针对直径_平均列的示例):

df_size['SMS'].apply(lambda df_lookup.SMS: df_lookup['diameter'][(df_size['diameter_mean'])].values[0])

导致关键错误。

目标 df_size 看起来像:

df_size
'diameter_mean' 'diameter_min' 'diameter_max' 'SMS'     'SMS_min'    'SMS_max'
0.510           0.450           0.540         0.021791  0.021548     0.022038
0.899           0.820           1.150         1.302340  1.296070     2.311770
1.745           1.587           2.020         3.498933  4.144621     3.512665
2.020           1.745           2.405         3.512665  3.498933     3.610773

顺便问一下,两个数据帧在查找参数(=直径)方面是否有必要具有强单调行为?

python pandas dataframe lookup-tables
1个回答
0
投票

您可以使用一系列

merge_asof

tmp = df_size.reset_index()

merges = {'SMS': ('diameter_mean', 'nearest'),
          'SMS_min': ('diameter_min', 'forward'),
          'SMS_max': ('diameter_max', 'backward'),
         }

for k, (c, d) in merges.items():
    df_size[k] = pd.merge_asof(
                      tmp.sort_values(by=c), df_lookup,
                      left_on=c, right_on='diameter',
                      direction=d
                 ).set_index('index')['SMS']

输出:

   diameter_mean  diameter_min  diameter_max       SMS   SMS_min   SMS_max
0          0.510         0.450         0.540  0.021791  0.021548  0.022038
1          0.899         0.820         1.150  1.302340  1.296070  2.311770
2          1.745         1.587         2.020  3.498933  4.144621  3.512665
3          2.020         1.745         2.405  3.512665  3.498933  3.610773
© www.soinside.com 2019 - 2024. All rights reserved.