对 panda 数据框进行多列排序，其中 ASC/DESC 基于条件

Question

我有一个 pandas 数据框：

             city  crime  pollution  ...  housing  weather  Similarity
0      Atlanta GA    101         97  ...      119       60        2.75
2    Baltimore MD    103        126  ...      103       80        2.50
4  San Antonio TX     98        126  ...      100       65        2.50
1        Akron OH    100        119  ...       97       70        1.75
3   Charleston SC    106        127  ...       82       75        1.50

我想首先根据相似度降序排序。得到该代码：

df.sort_values('Similarity', ascending=False)

但是如果有平局，我想根据用户输入列表进行排序。在我的数据的 8 列中，用户最多可以将 5 列添加到首选项列表中，该列表按添加顺序存储：

userPrefs = [] 
# ex: userPrefs = ['crime','education','weather','pollution']

因此，如果存在平局（例如 df[2] 和 df[4] 之间），我需要代码根据列表进行排序。得到该代码：

df.sort_values(userPrefs[0], ascending=False).sort_values(userPrefs[1], ascending=False) # -- etc...
  .sort_values('Similarity', ascending=False)

我面临的问题是，根据添加到 userPrefs 的列，排序可能需要升序，也可能需要降序。所以如果

'crime' == userPref[0]

，我希望它是ASC（犯罪率最低是最好的）；但是，如果

'education' == userPref[0]

，我想对DESC进行排序（高等教育是最好的）。

如何根据条件排序？我在想：

ascList = ['crime','housing','pollution'] # the lower, the better
descList = ['education'] # the higher, the better

df.sort_values(userPrefs[0], ascending= if x in ascList)

..但这不起作用，而且我不清楚 lambda 函数。谢谢！

Answer 1

您可以将

ascending

值列表指定为

sort_values

:

userPrefs = ['crime','weather','pollution']
ascList = ['crime','housing','pollution'] # the lower, the better

df.sort_values(by=['Similarity']+userPrefs, ascending=[False]+[u in ascList for u in userPrefs])

样本数据的输出：

             city  crime  pollution  housing  weather  Similarity
0      Atlanta GA    101         97      119       60        2.75
4  San Antonio TX     98        126      100       65        2.50
2    Baltimore MD    103        126      103       80        2.50
1        Akron OH    100        119       97       70        1.75
3   Charleston SC    106        127       82       75        1.50

Answer 2

您不需要 lambda 函数，即时创建函数

by=

的参数

ascending=

和

.sort_values()

：

# use sets here:
ascList = {"crime", "housing", "pollution"}
descList = {"education"}

# example userPrefs:
userPrefs = ["crime", "weather", "pollution"]


df = df.sort_values(
    by=["Similarity", *userPrefs], ascending=[False, *[p in ascList for p in userPrefs]]
)

print(df)

打印：

             city  crime  pollution  housing  weather  Similarity
0      Atlanta GA    101         97      119       60        2.75
4  San Antonio TX     98        126      100       65        2.50
2    Baltimore MD    103        126      103       80        2.50
1        Akron OH    100        119       97       70        1.75
3   Charleston SC    106        127       82       75        1.50

对 panda 数据框进行多列排序，其中 ASC/DESC 基于条件

问题描述投票：0回答：2

2个回答

最新问题

对 panda 数据框进行多列排序，其中 ASC/DESC 基于条件

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2