如何通过分箱列对 pandas 数据框进行排序

问题描述 投票:0回答:2

我创建了以下 pandas 数据框:

import pandas as pd
ds = {'col1' : ['(-9999999, 550.0]','(13700.0, 23700.0]','(23700.0, 414580.0]','(4000.0, 8000.0]','(414580.0, 9999999]','(550.0, 4000.0]','(8000.0, 13700.0]'],                                                                                                                             
      'col2' : [905317.3,   606156.5,   586349.6,   665779.1,   0,  803824.4,   628475.2]}

df = pd.DataFrame(data=ds)

数据框如下所示:

print(df)
                  col1      col2
0    (-9999999, 550.0]  905317.3
1   (13700.0, 23700.0]  606156.5
2  (23700.0, 414580.0]  586349.6
3     (4000.0, 8000.0]  665779.1
4  (414580.0, 9999999]       0.0
5      (550.0, 4000.0]  803824.4
6    (8000.0, 13700.0]  628475.2

我需要按列

col1
按升序对数据框进行排序。生成的数据框将如下所示:

                  col1      col2
0    (-9999999, 550.0]  905317.3
1      (550.0, 4000.0]  803824.4
2     (4000.0, 8000.0]  665779.1
3    (8000.0, 13700.0]  628475.2
4   (13700.0, 23700.0]  606156.5
5  (23700.0, 414580.0]  586349.6
6  (414580.0, 9999999]       0.0

有人可以帮助我吗?

pandas dataframe sorting bin
2个回答
1
投票

代码

out = df.sort_values(
    'col1', key=lambda x: x.str.extract(r'\(([^,]+),')[0].astype('float')
)

输出:

                  col1      col2
0    (-9999999, 550.0]  905317.3
5      (550.0, 4000.0]  803824.4
3     (4000.0, 8000.0]  665779.1
6    (8000.0, 13700.0]  628475.2
1   (13700.0, 23700.0]  606156.5
2  (23700.0, 414580.0]  586349.6
4  (414580.0, 9999999]       0.0

0
投票

如果

'col1'
具有字符串值,您可以使用正则表达式提取下限并使用它进行排序。您可以使用带有
sort_values
:

的自定义键来完成此操作
df = df.sort_values(
    "col1", key=lambda x: x.str.extract(r"\((-?\d*\.?\d*),")[0].astype(float)
)
                  col1      col2
0    (-9999999, 550.0]  905317.3
5      (550.0, 4000.0]  803824.4
3     (4000.0, 8000.0]  665779.1
6    (8000.0, 13700.0]  628475.2
1   (13700.0, 23700.0]  606156.5
2  (23700.0, 414580.0]  586349.6
4  (414580.0, 9999999]       0.0
© www.soinside.com 2019 - 2024. All rights reserved.