我创建了以下 pandas 数据框:
import pandas as pd
ds = {'col1' : ['(-9999999, 550.0]','(13700.0, 23700.0]','(23700.0, 414580.0]','(4000.0, 8000.0]','(414580.0, 9999999]','(550.0, 4000.0]','(8000.0, 13700.0]'],
'col2' : [905317.3, 606156.5, 586349.6, 665779.1, 0, 803824.4, 628475.2]}
df = pd.DataFrame(data=ds)
数据框如下所示:
print(df)
col1 col2
0 (-9999999, 550.0] 905317.3
1 (13700.0, 23700.0] 606156.5
2 (23700.0, 414580.0] 586349.6
3 (4000.0, 8000.0] 665779.1
4 (414580.0, 9999999] 0.0
5 (550.0, 4000.0] 803824.4
6 (8000.0, 13700.0] 628475.2
我需要按列
col1
按升序对数据框进行排序。生成的数据框将如下所示:
col1 col2
0 (-9999999, 550.0] 905317.3
1 (550.0, 4000.0] 803824.4
2 (4000.0, 8000.0] 665779.1
3 (8000.0, 13700.0] 628475.2
4 (13700.0, 23700.0] 606156.5
5 (23700.0, 414580.0] 586349.6
6 (414580.0, 9999999] 0.0
有人可以帮助我吗?
代码
out = df.sort_values(
'col1', key=lambda x: x.str.extract(r'\(([^,]+),')[0].astype('float')
)
输出:
col1 col2
0 (-9999999, 550.0] 905317.3
5 (550.0, 4000.0] 803824.4
3 (4000.0, 8000.0] 665779.1
6 (8000.0, 13700.0] 628475.2
1 (13700.0, 23700.0] 606156.5
2 (23700.0, 414580.0] 586349.6
4 (414580.0, 9999999] 0.0
如果
'col1'
具有字符串值,您可以使用正则表达式提取下限并使用它进行排序。您可以使用带有 sort_values
: 的自定义键来完成此操作
df = df.sort_values(
"col1", key=lambda x: x.str.extract(r"\((-?\d*\.?\d*),")[0].astype(float)
)
col1 col2
0 (-9999999, 550.0] 905317.3
5 (550.0, 4000.0] 803824.4
3 (4000.0, 8000.0] 665779.1
6 (8000.0, 13700.0] 628475.2
1 (13700.0, 23700.0] 606156.5
2 (23700.0, 414580.0] 586349.6
4 (414580.0, 9999999] 0.0