我在 PANDAS 中有以下数据框:
Index Job Title Year Salary
0 POL - POL 47 HQ Police Chief 2022 243000
1 POL - POL 47 HQ Police Chief 2023 258000
2 CAT - CAT 30 County Attorney 2022 220000
3 CAT - CAT 30 County Attorney 2022 236000
4 CAT - CAT 30 County Attorney 2023 258000
5 DOT - DOT 50 Director 2022 228000
6 DOT - DOT 50 Director 2023 244000
7 POL - HQ Police Chief 2019 239566
8 POL - HQ Police Chief 2020 225000
9 POL - HQ Police Chief 2021 236000
10 IGR - IGR 20 Director 2022 232000
11 IGR - IGR 20 Director 2023 232000
12 FIN - FIN 32 Director 2022 220000
13 FIN - FIN 32 Director 2023 236000
14 PRO - PRO 35 Procurement Director 2022 220000
我正在尝试找到一种方法来将唯一的“年份”列值设置为索引并将唯一的“职务”值设置为字段。然后,“薪水”列中的值将填充“年份”和“职务”相交的单元格。如果没有对应于交集的值,则应返回 0 或 NaN。
它应该看起来像这样:
POL - HQ Police Chief FIN - FIN 32 Director PRO - PRO 35 Procurement Director
2019 239566 NaN NaN
2020 225000 Nan NaN
2021 236000 NaN NaN
2022 NaN 220000 220000
2023 NaN 236000 NaN
感谢您的帮助!
我一直在设置索引和转置,但我只成功地制作了具有大量重复或冗余的非功能性数据框。
尝试像这样的 pd.crosstab :
pd.crosstab(df['Year'], df['Job Title'], df['Salary'], aggfunc='sum')
输出:
Job Title CAT - CAT 30 County Attorney DOT - DOT 50 Director FIN - FIN 32 Director IGR - IGR 20 Director POL - HQ Police Chief POL - POL 47 HQ Police Chief PRO - PRO 35 Procurement Director
Year
2019 NaN NaN NaN NaN 239566.0 NaN NaN
2020 NaN NaN NaN NaN 225000.0 NaN NaN
2021 NaN NaN NaN NaN 236000.0 NaN NaN
2022 456000.0 228000.0 220000.0 232000.0 NaN 243000.0 220000.0
2023 258000.0 244000.0 236000.0 232000.0 NaN 258000.0 NaN