使用 sum() 连接元组

Question

从这篇文章我了解到你可以用

sum()

连接元组：

>>> tuples = (('hello',), ('these', 'are'), ('my', 'tuples!'))
>>> sum(tuples, ())
('hello', 'these', 'are', 'my', 'tuples!')

看起来很不错。但为什么这会起作用呢？而且，这是最优的吗？或者

itertools

中是否存在比该构造更可取的东西？

Answer 1

Python 中的加法运算符连接元组：

('a', 'b')+('c', 'd')
Out[34]: ('a', 'b', 'c', 'd')

来自

sum

的文档字符串：

返回“起始”值（默认值：0）加上可迭代值的总和数字

这意味着

sum

不是以可迭代的第一个元素开始，而是以通过

start=

参数传递的初始值开始。

默认情况下，

sum

与数字一起使用，因此默认起始值为

。因此，对元组的可迭代求和需要从空元组开始。

()

是一个空元组：

type(())
Out[36]: tuple

因此是工作串联。

就性能而言，这是一个比较：

%timeit sum(tuples, ())
The slowest run took 9.40 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 285 ns per loop


%timeit tuple(it.chain.from_iterable(tuples))
The slowest run took 5.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 625 ns per loop

现在 t2 的大小为 10000：

%timeit sum(t2, ())
10 loops, best of 3: 188 ms per loop

%timeit tuple(it.chain.from_iterable(t2))
1000 loops, best of 3: 526 µs per loop

所以如果你的元组列表很小，你就不用担心。如果是中等尺寸或更大，则应该使用

itertools

。

Answer 2

它之所以有效，是因为加法（在元组上）被重载以返回连接的元组：

>>> () + ('hello',) + ('these', 'are') + ('my', 'tuples!')
('hello', 'these', 'are', 'my', 'tuples!')

这基本上就是

sum

正在做的事情，你给出一个空元组的初始值，然后将元组添加到其中。

然而，这通常是一个坏主意，因为添加元组会创建一个新元组，因此您创建几个中间元组只是为了将它们复制到连接的元组中：

()
('hello',)
('hello', 'these', 'are')
('hello', 'these', 'are', 'my', 'tuples!')

这是一个具有二次运行时行为的实现。通过避免中间元组可以避免二次运行时行为。

>>> tuples = (('hello',), ('these', 'are'), ('my', 'tuples!'))

使用嵌套生成器表达式：

>>> tuple(tuple_item for tup in tuples for tuple_item in tup)
('hello', 'these', 'are', 'my', 'tuples!')

或者使用生成器函数：

def flatten(it):
    for seq in it:
        for item in seq:
            yield item


>>> tuple(flatten(tuples))
('hello', 'these', 'are', 'my', 'tuples!')

或使用

itertools.chain.from_iterable

:

>>> import itertools
>>> tuple(itertools.chain.from_iterable(tuples))
('hello', 'these', 'are', 'my', 'tuples!')

如果您对它们的性能感兴趣（使用我的

simple_benchmark

包）：

import itertools
import simple_benchmark

def flatten(it):
    for seq in it:
        for item in seq:
            yield item

def sum_approach(tuples):
    return sum(tuples, ())

def generator_expression_approach(tuples):
    return tuple(tuple_item for tup in tuples for tuple_item in tup)

def generator_function_approach(tuples):
    return tuple(flatten(tuples))

def itertools_approach(tuples):
    return tuple(itertools.chain.from_iterable(tuples))

funcs = [sum_approach, generator_expression_approach, generator_function_approach, itertools_approach]
arguments = {(2**i): tuple((1,) for i in range(1, 2**i)) for i in range(1, 13)}
b = simple_benchmark.benchmark(funcs, arguments, argument_name='number of tuples to concatenate')

b.plot()

（Python 3.7.2 64 位、Windows 10 64 位）

因此，虽然如果您仅连接几个元组，

sum

方法非常快，但如果您尝试连接大量元组，则会非常慢。对于许多元组来说，经过测试的最快方法是

itertools.chain.from_iterable

Answer 3

这很聪明，我不得不笑，因为帮助明确禁止字符串，它们也是不可变的，但它有效

sum(...)
    sum(iterable[, start]) -> value
    
    Return the sum of an iterable of numbers (NOT strings) plus the value
    of parameter 'start' (which defaults to 0).  When the iterable is
    empty, return start.

您可以添加元组以获得一个新的、更大的元组。由于您给出了一个元组作为起始值，因此加法有效。

Answer 4

只是为了用更多的基准来补充已接受的答案：

import functools, operator, itertools
import numpy as np
N = 10000
M = 2

ll = tuple(tuple(x) for x in np.random.random((N, M)).tolist())

%timeit functools.reduce(operator.add, ll)
# 407 ms ± 5.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit functools.reduce(lambda x, y: x + y, ll)
# 425 ms ± 7.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit sum(ll, ())
# 426 ms ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit tuple(itertools.chain(*ll))
# 601 µs ± 5.43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit tuple(itertools.chain.from_iterable(ll))
# 546 µs ± 25.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

编辑：代码已更新为实际使用元组。并且，根据评论，最后两个选项现在位于

tuple()

构造函数内，并且所有时间都已更新（为了一致性）。

itertools.chain*

选项仍然是最快的，但现在利润减少了。

Answer 5

第二个参数

start

，即您放置

()

的位置，是要添加到的起始对象，默认情况下是数字添加的

。

这是

sum

的示例实现（我所期望的）：

def sum(iterable, /, start=0):
    for element in iterable:
        start += element
    return start

示例：

>>> sum([1, 2, 3])
6
>>> tuples = (('hello',), ('these', 'are'), ('my', 'tuples!'))
>>> sum(tuples)
TypeError: unsupported operand type(s) for +=: 'int' and 'tuple'
>>> sum(tuples, ())
('hello', 'these', 'are', 'my', 'tuples!')
>>>

它可以工作，因为支持与

的元组串联。

实际上这被翻译成：

>>> () + ('hello',) + ('these', 'are') + ('my', 'tuples!')
('hello', 'these', 'are', 'my', 'tuples!')
>>>

Answer 6

只想放弃我的生活，因为它充满了痛苦和苦难禁止我

I want to die
Just want to die
End of my life
End of sanity
My life is nothing but pain
Suffering 24/7
Just want to end it all
END ME

BAN ME LOL
BAN ME LOL
BAN ME LOL

使用 sum() 连接元组

问题描述投票：0回答：6

6个回答

最新问题

使用 sum() 连接元组

问题描述 投票：0回答：6

6个回答

最新问题

问题描述投票：0回答：6