折叠 Pandas 数据框以保留所有列，但根据分组依据/透视列指定列的存储顺序

Question

我有一个 Pandas 数据框，其中包含一个人的多个发现（病史），我希望将一个人的病史全部折叠成一行，但保留订单，但在预约日期级别，假设所有发现/检查结果都来自他们的过去以宽格式格式化。

我不知道如何最好地做到这一点，因为所有

groupby

方法都要求我提供一个

agg

，然后通过连接将所有列合并为一列，而不是过去给定约会的新单独列。

某些列不会

pivoted

或用作

groupby

索引 (

patientId, apptDate, age, bmi

)

需要考虑的是如何最好地处理创建的病史

mh_

列的排序，以便首先将记录填充到较低生成的列中

mh_result1

等

原DF

| patientId | apptDate   | age | bmi | examinationId | result     | category       | comment                                     |
|-----------|------------|-----|-----|---------------|------------|----------------|---------------------------------------------|
| 1         | 2024-07-08 | 45  | 22  | 45            | Long Term  | Cardiovascular | Cardiovascular defect finding, fup required |
| 1         | 2024-02-01 | 45  | 22  | 33            | None       | None           | None                                        |
| 1         | 2023-11-14 | 45  | 22  | 12            | Short Term | Respiratory    | Shortness of breath, med prescribed         |
| 2         | 2023-12-23 | 32  | 12  | 18            | Short Term | Gastro         | Recorded malnutrition                       |
| 2         | 2022-12-11 | 32  | 13  | 21            | Short Term | Gastro         | None                                        |

所需的DF

| patientId | apptDate   | age | bmi | examinationId | result     | category       | comment                                     | mh_result1 | mh_category1 | mh_comment1                         | mh_result2 | mh_category2 | mh_category2 |
|-----------|------------|-----|-----|---------------|------------|----------------|---------------------------------------------|------------|--------------|-------------------------------------|------------|--------------|--------------|
| 1         | 2024-07-08 | 45  | 22  | 45            | Long Term  | Cardiovascular | Cardiovascular defect finding, fup required | Short Term | Respiratory  | Shortness of breath, med prescribed | None       | None         | None         |
| 1         | 2024-02-01 | 45  | 22  | 33            | None       | None           | None                                        | Short Term | Respiratory  | Shortness of breath, med prescribed | None       | None         | None         |
| 1         | 2023-11-14 | 45  | 22  | 12            | Short Term | Respiratory    | Shortness of breath, med prescribed         | None       | None         | None                                | None       | None         | None         |
| 2         | 2023-12-23 | 32  | 12  | 18            | Short Term | Gastro         | Recorded malnutrition                       | Short Term | Gastro       | None                                | None       | None         | None         |
| 2         | 2022-12-11 | 32  | 13  | 21            | Short Term | Gastro         | None                                        | None       | None         | None                                | None       | None         | None         |

Answer 1

你可以

pivot

，然后

merge

：

tmp = (df
   .sort_values(by='apptDate')
   .assign(col=lambda x: x.groupby('patientId').cumcount().add(1))
   .pivot(index=['patientId', 'apptDate'], columns='col', values=['result', 'category', 'comment'])
   .sort_index(level=1, axis=1, sort_remaining=False)
   .groupby(level='patientId').transform(lambda x: x.ffill().shift())
)

tmp.columns = tmp.columns.map(lambda x: f'mh_{x[0]}{x[1]}')

out = df.merge(tmp, left_on=['patientId', 'apptDate'], right_index=True, how='left')

输出：

   patientId    apptDate  age  bmi  examinationId      result        category                                      comment  mh_result1 mh_category1                          mh_comment1 mh_result2 mh_category2 mh_comment2 mh_result3 mh_category3 mh_comment3
0          1  2024-07-08   45   22             45   Long Term  Cardiovascular  Cardiovascular defect finding, fup required  Short Term  Respiratory  Shortness of breath, med prescribed        NaN          NaN         NaN       None         None        None
1          1  2024-02-01   45   22             33         NaN             NaN                                          NaN  Short Term  Respiratory  Shortness of breath, med prescribed        NaN          NaN         NaN       None         None        None
2          1  2023-11-14   45   22             12  Short Term     Respiratory          Shortness of breath, med prescribed        None         None                                 None        NaN          NaN         NaN       None         None        None
3          2  2023-12-23   32   12             18  Short Term          Gastro                        Recorded malnutrition  Short Term       Gastro                                  NaN       None         None        None        NaN          NaN         NaN
4          2  2022-12-11   32   13             21  Short Term          Gastro                                          NaN        None         None                                  NaN       None         None        None        NaN          NaN         NaN

折叠 Pandas 数据框以保留所有列，但根据分组依据/透视列指定列的存储顺序

问题描述投票：0回答：1

1个回答

最新问题

折叠 Pandas 数据框以保留所有列，但根据分组依据/透视列指定列的存储顺序

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1