我有两个 pandas 数据框,看起来像:
df1
记录学生及其模拟考试成绩和模拟考试日期:
ID Mock_Date Student_ID Mock_score
1 14/3/2020 792 213
2 9/5/2020 792 437
3 17/8/2020 792 435
4 4/1/2022 14598 112312
5 29/12/2022 14350 4325
6 3/10/2019 621 523
7 12/8/2020 621 876
8 5/5/2022 621 4324
9 6/9/2022 621 5432
10 6/3/2022 455 34
df2
记录学生及其实际考试成绩和考试日期:
Student_ID Date Score
324 14/2/2019 543
792 14/2/2019 9785
792 3/11/2019 7690
621 3/11/2019 324
12 16/3/2020 34234
792 16/3/2020 4235
14598 16/3/2020 975
792 9/5/2020 427
792 17/8/2020 876
621 17/8/2020 986
我想使用以下逻辑将
df1
与 df2
合并:对于 df2
中的特定行(特定学生的实际考试成绩),使用 df1
中的行以及之前的模拟考试日期实际考试日期(即实际考试日期之前最接近的日期),如果不存在,则输入 NaN。所以所需的输出如下所示:
Student_ID Date Score Mock_Date Mock_score
324 14/2/2019 543 NaN NaN
792 14/2/2019 9785 NaN NaN
792 3/11/2019 7690 NaN NaN
621 3/11/2019 324 3/10/2019 523 #last occurrence before 3/11 is 3/10
12 16/3/2020 34234 NaN NaN
792 16/3/2020 4235 14/3/2020 213 #last occurrence before 16/3 is 14/3
14598 16/3/2020 975 NaN NaN
792 9/5/2020 427 14/3/2020 213 #last occurrence before 9/5 is 14/3
792 17/8/2020 876 9/5/2020 437 #last occurrence before 17/8 is 9/5
621 17/8/2020 986 12/8/2020 876
我什至不知道如何开始,提前谢谢。
pandas.merge_asof
向后方向(首先对数据帧进行排序):
df1 = df1.sort_values(by="Mock_Date")
df2 = df2.sort_values(by="Date")
out = pd.merge_asof(
df2,
df1,
by="Student_ID",
left_on="Date",
right_on="Mock_Date",
direction="backward",
)
print(out)
打印:
Student_ID Date Score ID Mock_Date Mock_score
0 324 2019-02-14 543 NaN NaT NaN
1 792 2019-02-14 9785 NaN NaT NaN
2 792 2019-11-03 7690 NaN NaT NaN
3 621 2019-11-03 324 6.0 2019-10-03 523.0
4 12 2020-03-16 34234 NaN NaT NaN
5 792 2020-03-16 4235 1.0 2020-03-14 213.0
6 14598 2020-03-16 975 NaN NaT NaN
7 792 2020-05-09 427 2.0 2020-05-09 437.0
8 792 2020-08-17 876 3.0 2020-08-17 435.0
9 621 2020-08-17 986 7.0 2020-08-12 876.0