为什么 pandas.merge_asof 在我的例子中出现错误?

问题描述 投票:0回答:1

我正在尝试使用 pandas.merge_asof 合并 2 个表。

第一个表administrators_system_with_schemes_sort:

沙龙_id staff_id 日期
872646 2715596 2024-10-02 00:00:00
872646 2715596 2024-10-03 00:00:00
872646 2715596 2024-10-06 00:00:00
872646 2715596 2024-10-07 00:00:00
872646 2715596 2024-10-10 00:00:00
872646 2715596 2024-10-11 00:00:00
872646 2715596 2024-10-14 00:00:00
872646 2715596 2024-10-15 00:00:00

第二个表,bonus_and_penalty_for_staff_id_administrators_sort:

沙龙_id staff_id 日期 奖金 处罚
872646 2715596 2024-10-12 00:00:00 4070 0

我的代码:

astype_dict = {
    'salon_id': 'int64', 'staff_id': 'int64'
    , 'date': 'datetime64[ns]'
}

administrators_system_with_schemes['date'] = [pd.to_datetime(date).date() for date in administrators_system_with_schemes['date']]
bonus_and_penalty_for_staff_id_administrators['date'] = [pd.to_datetime(date).date() for date in bonus_and_penalty_for_staff_id_administrators['date']]

administrators_system_with_schemes_sort = (
    administrators_system_with_schemes.copy()
    .astype(astype_dict)
    .sort_values(by='date')
)

bonus_and_penalty_for_staff_id_administrators_sort = (
    bonus_and_penalty_for_staff_id_administrators.copy()
    .astype(astype_dict)
    .sort_values(by='date')
)

administrators_system_with_schemes_with_additional_bonus_penalty = (
    pd.merge_asof(
        left = administrators_system_with_schemes_sort
        , right = bonus_and_penalty_for_staff_id_administrators_sort
        , on = ['date']
        , by = ['salon_id', 'staff_id']
        , suffixes=['', '_y']
        , direction='nearest'
))

结果:

|   salon_id |   staff_id | date                |   bonus |   penalty |
|-----------:|-----------:|:--------------------|--------:|----------:|
|     872646 |    2715596 | 2024-10-02 00:00:00 |       0 |         0 |
|     872646 |    2715596 | 2024-10-03 00:00:00 |       0 |         0 |
|     872646 |    2715596 | 2024-10-06 00:00:00 |       0 |         0 |
|     872646 |    2715596 | 2024-10-07 00:00:00 |       0 |         0 |
|     872646 |    2715596 | 2024-10-10 00:00:00 |       0 |         0 |
|     872646 |    2715596 | 2024-10-11 00:00:00 |       0 |         0 |
|     872646 |    2715596 | 2024-10-14 00:00:00 |       0 |         0 |
|     872646 |    2715596 | 2024-10-15 00:00:00 |       0 |         0 |

结果是错误的,因为我在表格中得到了合适的值。 我已经尝试了很多方法来更改数据类型,但仍然出现此错误。

有什么想法,如何解决这个问题吗?

谢谢。

熊猫版。 2.1.4(版本 2.2.3 上有同样的错误)。 蟒蛇版本。 3.11.7

python pandas dataframe
1个回答
0
投票
import pandas as pd


class DataMerger:
    def __init__(self, admins_df, bonuses_df):
        self.admins_df = admins_df
        self.bonuses_df = bonuses_df
        self.astype_dict = {
            'salon_id': 'int64',
            'staff_id': 'int64',
            'date': 'datetime64[ns]'
        }

    def preprocess_data(self):
        # Converting "date" columns to datetime format and ensuring the required data types
        self.admins_df['date'] = pd.to_datetime(self.admins_df['date'])
        self.bonuses_df['date'] = pd.to_datetime(self.bonuses_df['date'])

        # We apply typing and sorting
        self.admins_df = self.admins_df.astype(
            self.astype_dict).sort_values(by='date')
        self.bonuses_df = self.bonuses_df.astype(
            self.astype_dict).sort_values(by='date')

    def merge_data(self):
        # Using merge_asof to merge data
        merged_df = pd.merge_asof(
            left=self.admins_df,
            right=self.bonuses_df,
            on='date',
            by=['salon_id', 'staff_id'],
            suffixes=('', '_y'),
            direction='backward'  # Use backward to take into account the closest past values
        )

        # Fill NaN values ​​with zeros for bonuses and penalties
        merged_df[['bonus', 'penalty']] = merged_df[[
            'bonus', 'penalty']].fillna(0).astype(int)
        return merged_df


if __name__ == "__main__":
    # Assume administrators_system_with_schemes and bonus_and_penalty_for_staff_id_administrators are your DataFrames
    administrators_system_with_schemes = pd.DataFrame({
        'salon_id': [872646] * 8,
        'staff_id': [2715596] * 8,
        'date': [
            '2024-10-02', '2024-10-03', '2024-10-06', '2024-10-07',
            '2024-10-10', '2024-10-11', '2024-10-14', '2024-10-15'
        ]
    })

    bonus_and_penalty_for_staff_id_administrators = pd.DataFrame({
        'salon_id': [872646],
        'staff_id': [2715596],
        'date': ['2024-10-12'],
        'bonus': [4070],
        'penalty': [0]
    })

    # Create an instance of the class and perform operations
    merger = DataMerger(administrators_system_with_schemes,
                        bonus_and_penalty_for_staff_id_administrators)
    merger.preprocess_data()
    print(merger.merge_data())

© www.soinside.com 2019 - 2024. All rights reserved.