从一组（相似）字符串中确定前缀

Question

我有一组字符串，例如

my_prefix_what_ever
my_prefix_what_so_ever
my_prefix_doesnt_matter

我只是想找到这些字符串中最长的公共部分，这里是前缀。在上面的结果应该是

my_prefix_

字符串

my_prefix_what_ever
my_prefix_what_so_ever
my_doesnt_matter

应该导致前缀

my_

在Python中有一种相对无痛的方法来确定前缀（无需手动迭代每个字符）吗？

PS：我使用的是Python 2.6.3。

Answer 1

永远不要改写提供给你的东西：os.path.commonprefix就是这样做的：

返回最长路径前缀（逐个字符），它是列表中所有路径的前缀。如果list为空，则返回空字符串（''）。请注意，这可能会返回无效路径，因为它一次只能处理一个字符。

为了与其他答案进行比较，这里是代码：

# Return the longest prefix of all list elements.
def commonprefix(m):
    "Given a list of pathnames, returns the longest common leading component"
    if not m: return ''
    s1 = min(m)
    s2 = max(m)
    for i, c in enumerate(s1):
        if c != s2[i]:
            return s1[:i]
    return s1

Answer 2

Ned Batchelder可能是正确的。但是为了它的乐趣，这是使用phimuemue更有效的itertools答案。

import itertools

strings = ['my_prefix_what_ever', 
           'my_prefix_what_so_ever', 
           'my_prefix_doesnt_matter']

def all_same(x):
    return all(x[0] == y for y in x)

char_tuples = itertools.izip(*strings)
prefix_tuples = itertools.takewhile(all_same, char_tuples)
''.join(x[0] for x in prefix_tuples)

作为对可读性的冒犯，这是一个单行的版本:)

>>> from itertools import takewhile, izip
>>> ''.join(c[0] for c in takewhile(lambda x: all(x[0] == y for y in x), izip(*strings)))
'my_prefix_'

Answer 3

这是我的解决方案：

a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]

prefix_len = len(a[0])
for x in a[1 : ]:
    prefix_len = min(prefix_len, len(x))
    while not x.startswith(a[0][ : prefix_len]):
        prefix_len -= 1

prefix = a[0][ : prefix_len]

Answer 4

以下是一个有效但可能非常低效的解决方案。

a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]
b = zip(*a)
c = [x[0] for x in b if x==(x[0],)*len(x)]
result = "".join(c)

对于小组字符串，上面的问题完全没有问题。但对于较大的套装，我个人会编写另一个手动解决方案，一个接一个地检查每个字符，并在有差异时停止。

在算法上，这产生相同的过程，但是，人们可能能够避免构造列表c。

Answer 5

出于好奇，我想出了另一种方法：

def common_prefix(strings):

    if len(strings) == 1:#rule out trivial case
        return strings[0]

    prefix = strings[0]

    for string in strings[1:]:
        while string[:len(prefix)] != prefix and prefix:
            prefix = prefix[:len(prefix)-1]
        if not prefix:
            break

    return prefix

strings = ["my_prefix_what_ever","my_prefix_what_so_ever","my_prefix_doesnt_matter"]

print common_prefix(strings)
#Prints "my_prefix_"

正如Ned指出的那样，使用os.path.commonprefix可能更好，这是一个非常优雅的功能。

Answer 6

第二行使用输入字符串中每个字符的reduce函数。它返回N + 1个元素的列表，其中N是最短输入字符串的长度。

lot中的每个元素要么是（a）输入字符，如果所有输入字符串在该位置匹配，或者（b）无。 lot.index（None）是lot中第一个None的位置：公共前缀的长度。 out是常见的前缀。

val = ["axc", "abc", "abc"]
lot = [reduce(lambda a, b: a if a == b else None, x) for x in zip(*val)] + [None]
out = val[0][:lot.index(None)]

Answer 7

以下是使用OrderedDict以最少的代码执行此操作的另一种方法。

import collections
import itertools

def commonprefix(instrings):
    """ Common prefix of a list of input strings using OrderedDict """

    d = collections.OrderedDict()

    for instring in instrings:
        for idx,char in enumerate(instring):
            # Make sure index is added into key
            d[(char, idx)] = d.get((char,idx), 0) + 1

    # Return prefix of keys while value == length(instrings)
    return ''.join([k[0] for k in itertools.takewhile(lambda x: d[x] == len(instrings), d)])

Answer 8

这是一个简单的清洁解决方案。我们的想法是使用zip（）函数将所有字符排列在第1个字符列表，第2个字符列表，第n个字符列表中。然后迭代每个列表以检查它们是否只包含1个值。

a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]

list = [all(x[i] == x[i+1] for i in range(len(x)-1)) for x in zip(*a)]

print a[0][:list.index(0) if list.count(0) > 0 else len(list)]

输出：my_prefix_

从一组（相似）字符串中确定前缀

问题描述投票：60回答：8

8个回答

最新问题

从一组（相似）字符串中确定前缀

问题描述 投票：60回答：8

8个回答

最新问题

问题描述投票：60回答：8