在数据帧的特定索引处用另一行替换一行并更改单元格值

问题描述 投票:0回答:1

我有一个这样的csv示例:

                 keys                       key_regex    datatype detailed_datatype precedence  val_regex     val_regex_2  val_regex_3  max_words  alpha_char_check
0      billingAddress      original_billing_key_regex  alphabetic           address    primary        NaN             NaN          NaN        NaN               NaN
1     deliveryAddress     original_delivery_key_regex  alphabetic           address    primary        NaN             NaN          NaN        NaN               NaN
2         notifyParty     original_notify_party_regex  alphabetic        alphabetic    primary        NaN             NaN          NaN        NaN               NaN
3       originAddress   original_seller_address_regex  alphabetic           address    primary        NaN             NaN          NaN        NaN               NaN
4   billingAddressAlt   alternative_billing_key_regex  alphabetic           address   tertiary        NaN             NaN          NaN        NaN               NaN
5  deliveryAddressAlt  alternative_delivery_key_regex  alphabetic           address   tertiary        NaN             NaN          NaN        5.0               1.0
6    originAddressAlt    alternative_seller_key_regex  alphabetic           address   tertiary        NaN  sample_val_re1          NaN        NaN               0.0

我正在尝试将keys列中具有键值的行替换为tertiary_row_replacement_dict列值作为对应值的行,然后从keys列值中重命名precendence'tertiary'-保持索引位置与以前相同。

预期的输出是这样的:

'primary'

[有3个原始的csv-每个都有的csvs很大,有很多类似的情况,即具有主要优先级的键和具有主要优先级的备用键。我用键的字典这样的字典:

              keys                       key_regex    datatype detailed_datatype precedence  val_regex     val_regex_2  val_regex_3  max_words  alpha_char_check
0   billingAddress   alternative_billing_key_regex  alphabetic           address    primary        NaN             NaN          NaN        NaN               NaN
1  deliveryAddress  alternative_delivery_key_regex  alphabetic           address    primary        NaN             NaN          NaN        5.0               1.0
2      notifyParty     original_notify_party_regex  alphabetic        alphabetic    primary        NaN             NaN          NaN        NaN               NaN
3    originAddress    alternative_seller_key_regex  alphabetic           address    primary        NaN  sample_val_re1          NaN        NaN               0.0

提供此字典的键和相应的值将始终存在于csv中,我有此代码:

tertiary_row_replacement_dict = {
    "originAddress": "originAddressAlt",
    "deliveryAddress": "deliveryAddressAlt",
    # "totalAmount": "totalAmountAlt",
    "billingAddress": "billingAddressAlt"
    ....
}

它完成了我想做的事情。仅在测试csv上执行此操作大约需要0.034秒,并且可能不是处理仅替换行和替换单元格值的这种情况的最佳或优化方法。是否有任何一种更快速的替代方法,并且具有先决条件知识,即可以用哪一行替换哪一行(即,使用该字典不是强制性的,我们可以将其用作列表列表的元组列表以进行速度权衡)。

python python-3.x pandas dataframe python-3.7
1个回答
1
投票

您可以使用for k, new_k in row_replacement_dict.items(): t2 = df.loc[df['keys']==new_k].index[0] df.loc[df.loc[df['keys']==k].index[0]] = [i if i!='tertiary' else 'primary' for i in df.loc[t2]] df = df.replace([new_k, 'tertiary'], [k, 'primary']).drop([t2]) 将三键替换为主键,并使用replace填写信息:

groupby().first()

输出:

inverse_dict = {v:k for k,v in tertiary_row_replacement_dict.items()}
(df.groupby(df['keys'].replace(inverse_dict))
   .first()
   .reset_index(drop=True)
)
© www.soinside.com 2019 - 2024. All rights reserved.