折叠 Pandas 数据框以保留所有列,但根据分组依据/透视列指定列的存储顺序

问题描述 投票:0回答:1

我有一个 Pandas 数据框,其中包含一个人的多个发现(病史),我希望将一个人的病史全部折叠成一行,但保留订单,但在预约日期级别,假设所有发现/检查结果都来自他们的过去以宽格式格式化。

我不知道如何最好地做到这一点,因为所有

groupby
方法都要求我提供一个
agg
,然后通过连接将所有列合并为一列,而不是过去给定约会的新单独列。

某些列不会

pivoted
或用作
groupby
索引 (
patientId, apptDate, age, bmi
)

需要考虑的是如何最好地处理创建的病史

mh_
列的排序,以便首先将记录填充到较低生成的列中
mh_result1

原DF

| patientId | apptDate   | age | bmi | examinationId | result     | category       | comment                                     |
|-----------|------------|-----|-----|---------------|------------|----------------|---------------------------------------------|
| 1         | 2024-07-08 | 45  | 22  | 45            | Long Term  | Cardiovascular | Cardiovascular defect finding, fup required |
| 1         | 2024-02-01 | 45  | 22  | 33            | None       | None           | None                                        |
| 1         | 2023-11-14 | 45  | 22  | 12            | Short Term | Respiratory    | Shortness of breath, med prescribed         |
| 2         | 2023-12-23 | 32  | 12  | 18            | Short Term | Gastro         | Recorded malnutrition                       |
| 2         | 2022-12-11 | 32  | 13  | 21            | Short Term | Gastro         | None                                        |

所需的DF

| patientId | apptDate   | age | bmi | examinationId | result     | category       | comment                                     | mh_result1 | mh_category1 | mh_comment1                         | mh_result2 | mh_category2 | mh_category2 |
|-----------|------------|-----|-----|---------------|------------|----------------|---------------------------------------------|------------|--------------|-------------------------------------|------------|--------------|--------------|
| 1         | 2024-07-08 | 45  | 22  | 45            | Long Term  | Cardiovascular | Cardiovascular defect finding, fup required | Short Term | Respiratory  | Shortness of breath, med prescribed | None       | None         | None         |
| 1         | 2024-02-01 | 45  | 22  | 33            | None       | None           | None                                        | Short Term | Respiratory  | Shortness of breath, med prescribed | None       | None         | None         |
| 1         | 2023-11-14 | 45  | 22  | 12            | Short Term | Respiratory    | Shortness of breath, med prescribed         | None       | None         | None                                | None       | None         | None         |
| 2         | 2023-12-23 | 32  | 12  | 18            | Short Term | Gastro         | Recorded malnutrition                       | Short Term | Gastro       | None                                | None       | None         | None         |
| 2         | 2022-12-11 | 32  | 13  | 21            | Short Term | Gastro         | None                                        | None       | None         | None                                | None       | None         | None         |
python-3.x pandas dataframe group-by
1个回答
0
投票

你可以

pivot
,然后
merge

tmp = (df
   .sort_values(by='apptDate')
   .assign(col=lambda x: x.groupby('patientId').cumcount().add(1))
   .pivot(index=['patientId', 'apptDate'], columns='col', values=['result', 'category', 'comment'])
   .sort_index(level=1, axis=1, sort_remaining=False)
   .groupby(level='patientId').transform(lambda x: x.ffill().shift())
)

tmp.columns = tmp.columns.map(lambda x: f'mh_{x[0]}{x[1]}')

out = df.merge(tmp, left_on=['patientId', 'apptDate'], right_index=True, how='left')

输出:

   patientId    apptDate  age  bmi  examinationId      result        category                                      comment  mh_result1 mh_category1                          mh_comment1 mh_result2 mh_category2 mh_comment2 mh_result3 mh_category3 mh_comment3
0          1  2024-07-08   45   22             45   Long Term  Cardiovascular  Cardiovascular defect finding, fup required  Short Term  Respiratory  Shortness of breath, med prescribed        NaN          NaN         NaN       None         None        None
1          1  2024-02-01   45   22             33         NaN             NaN                                          NaN  Short Term  Respiratory  Shortness of breath, med prescribed        NaN          NaN         NaN       None         None        None
2          1  2023-11-14   45   22             12  Short Term     Respiratory          Shortness of breath, med prescribed        None         None                                 None        NaN          NaN         NaN       None         None        None
3          2  2023-12-23   32   12             18  Short Term          Gastro                        Recorded malnutrition  Short Term       Gastro                                  NaN       None         None        None        NaN          NaN         NaN
4          2  2022-12-11   32   13             21  Short Term          Gastro                                          NaN        None         None                                  NaN       None         None        None        NaN          NaN         NaN
© www.soinside.com 2019 - 2024. All rights reserved.