按多个分隔的数字对字符串进行排序

Question

我有一个路径列表，我已将其简化为类似但更简单的字符串：

paths = ['apple10/banana2/carrot1', 'apple10/banana1/carrot2', 'apple2/banana1', 'apple2/banana2', 'apple1/banana1', 'apple1/banana2', 'apple10/banana1/carrot1']

这些路径需要按数字顺序排序。第一个数字（苹果）在搜索中最重要，其次是第二个。

一个可能很明显的额外复杂性是，某些路径将具有数据所在的第三个目录，而其他路径则没有。

路径结构的MWE如下所示：

parent 
|-----apple1 
          |------banana1 
                   |----- data*
          |------banana2 
                   |----- data*
|-----apple2
          |------banana1 
                   |----- data*
          |------banana2 
                   |----- data*
|-----apple10
          |------banana1 
                   |-----carrot1
                            |-----data*
                   |-----carrot2
                            |-----data*
          |------banana2 
                   |----- carrot1
                             |-----data*

期望的输出是：

paths = ['apple1/banana1', 'apple1/banana2', 'apple2/banana1', 'apple2/banana2', 'apple10/banana1/carrot1', 'apple10/banana1/carrot2','apple10/banana2/carrot1']

我正在努力弄清楚如何做到这一点。排序将不起作用，特别是当数字将进入两位数并且 10 将出现在 2 之前。

我看到了另一个答案，它适用于字符串列表中的单个数字。如何正确对里面有数字的字符串进行排序？我没能适应我的问题。

Answer 1

尝试使用

sorted

，提供一个使用

re

从路径中提取所有数字的自定义键：

import re

>>> sorted(paths, key=lambda x: list(map(int,re.findall("(\d+)", x))))
['apple1/banana1',
 'apple1/banana2',
 'apple2/banana1',
 'apple2/banana2',
 'apple10/banana1/carrot1',
 'apple10/banana1/carrot2',
 'apple10/banana2/carrot1']

Answer 2

除了@not_speshal 的答案：

根据您提供的问题的答案，如果路径中的第一个单词不一定是“apple”，您可以执行以下操作：

import re

def atoi(text):
    return int(text) if text.isdigit() else text

def word_and_num_as_tuple(text):
    return tuple( atoi(c) for c in re.split(r'(\d+)', text) )

def path_as_sortable_tuple(path, sep='/'):
    return tuple( word_and_num_as_tuple(word_in_path) for word_in_path in path.split(sep) )

paths = [
    'apple10/banana2/carrot1',
    'apple10/banana1/carrot2',
    'apple2/banana1',
    'apple2/banana2',
    'apple1/banana1',
    'apple1/banana2',
    'apple10/banana1/carrot1'
]


paths.sort(key=path_as_sortable_tuple)
print(paths)

# And, of course, as a lambda one-liner:
paths.sort( key= lambda path: tuple( tuple( int(char_seq) if char_seq.isdigit() else char_seq for char_seq in re.split(r'(\d+)', subpath) ) for subpath in path.split('/') ) )

它完全按照@MarcinCuprjak的建议进行，但是是自动完成的

Answer 3

如果您可以将数据表示为元组而不是字符串，那么事情会变得更容易：

paths = [('apple', 10, 'banana', 2, 'carrot', 1),
         ('apple', 10, 'banana', 1, 'carrot', 2),
         ('apple', 2, 'banana', 1),
         ('apple', 2, 'banana', 2),
         ('apple', 1, 'banana', 1),
         ('apple', 1, 'banana', 2),
         ('apple', 10, 'banana', 1, 'carrot', 1)
         ]

paths.sort(key=lambda item: (len(item), item))
print(paths)

输出如你所愿，我想：

[('apple', 1, 'banana', 1), ('apple', 1, 'banana', 2), ('apple', 2, 'banana', 1), ('apple', 2, 'banana', 2), ('apple', 10, 'banana', 1, 'carrot', 1), ('apple', 10, 'banana', 1, 'carrot', 2), ('apple', 10, 'banana', 2, 'carrot', 1)]

Answer 4

使用以下工具：

```
itertools.groupby
```
与
```
str.isdigit
```
将字符分组为连续的数字或非数字组；
```
''.join
```
从字符组中形成单词；
列表理解迭代组并过滤掉非数字组；
```
int
```
如果单词来自一组数字，则将其转换为整数。

将这些工具组合成

tuple

键以实现

sorted

:

from itertools import groupby

paths = ['apple10/banana2/carrot1', 'apple10/banana1/carrot2', 'apple2/banana1', 'apple2/banana2', 'apple1/banana1', 'apple1/banana2', 'apple10/banana1/carrot1']

sorted(paths,
       key=lambda s: tuple(int(''.join(group))
                           for are_digits,group in groupby(s, key=str.isdigit)
                           if are_digits))
# ['apple1/banana1', 'apple1/banana2', 'apple2/banana1', 'apple2/banana2', 'apple10/banana1/carrot1', 'apple10/banana1/carrot2', 'apple10/banana2/carrot1']

按多个分隔的数字对字符串进行排序

问题描述投票：0回答：4

4个回答

最新问题

按多个分隔的数字对字符串进行排序

问题描述 投票：0回答：4

4个回答

最新问题

问题描述投票：0回答：4