[python：内映射函数的收益

Question

是否可以在map函数内部使用yield？

出于POC目的，我创建了一个示例代码段。

# Python 3  (Win10)
from concurrent.futures import ThreadPoolExecutor
import os
def read_sample(sample):
    with open(os.path.join('samples', sample)) as fff:
        for _ in range(10):
            yield str(fff.read())

def main():
    with ThreadPoolExecutor(10) as exc:
        files = os.listdir('samples')
        files = list(exc.map(read_sample, files))
        print(str(len(files)), end="\r")

if __name__=="__main__":
     main()

我的示例文件夹中有100个文件。按照摘要，应打印100 * 10 = 1000。但是，它只打印100。当我检查它时，仅打印生成器对象。

进行什么更改后将被打印1000张？

Answer 1

（编辑：添加另一种方法和更简单的示例）

您可以将map()与生成器一起使用，但是它只会尝试映射生成器对象，而不会尝试下降到生成器本身中。

一种可能的方法是让生成器按照您想要的方式进行循环，并让函数对对象进行操作。这具有将循环与计算更整齐地分开的额外优势。因此，类似这样的方法应该起作用：

方法1

# Python 3  (Win10)
from concurrent.futures import ThreadPoolExecutor
import os
def read_samples(samples):
    for sample in samples:
        with open(os.path.join('samples', sample)) as fff:
            for _ in range(10):
                yield fff

def main():
    with ThreadPoolExecutor(10) as exc:
        files = os.listdir('samples')
        files = list(exc.map(lambda x: str(x.read()), read_samples(files)))
        print(str(len(files)), end="\r")

if __name__=="__main__":
     main()

另一种方法是嵌套一个额外的map调用以消耗生成器：

方法2

# Python 3  (Win10)
from concurrent.futures import ThreadPoolExecutor
import os
def read_samples(samples):
    for sample in samples:
        with open(os.path.join('samples', sample)) as fff:
            for _ in range(10):
                yield fff

def main():
    with ThreadPoolExecutor(10) as exc:
        files = os.listdir('samples')
        files = exc.map(list, exc.map(lambda x: str(x.read())), read_samples(files))
        files = [f for fs in files for f in fs]  # flattening the results
        print(str(len(files)), end="\r")

if __name__=="__main__":
     main()

一个更简单的例子

只是为了获得更多可重现的示例，可以将代码的特征写成一个更简单的示例（它不依赖于系统中放置的文件）：

from concurrent.futures import ThreadPoolExecutor


def foo(n):
    for i in range(n):
        yield i


with ThreadPoolExecutor(10) as exc:
    x = list(exc.map(foo, range(k)))
    print(x)
# [<generator object foo at 0x7f1a853d4518>, <generator object foo at 0x7f1a852e9990>, <generator object foo at 0x7f1a852e9db0>, <generator object foo at 0x7f1a852e9a40>, <generator object foo at 0x7f1a852e9830>, <generator object foo at 0x7f1a852e98e0>, <generator object foo at 0x7f1a852e9fc0>, <generator object foo at 0x7f1a852e9e60>]

方法1：

from concurrent.futures import ThreadPoolExecutor


def foos(ns):
    for n in range(ns):
        for i in range(n):
            yield i


with ThreadPoolExecutor(10) as exc:
    k = 8
    x = list(exc.map(lambda x: x ** 2, foos(k)))
    print(x)
# [0, 0, 1, 0, 1, 4, 0, 1, 4, 9, 0, 1, 4, 9, 16, 0, 1, 4, 9, 16, 25, 0, 1, 4, 9, 16, 25, 36]

方法2

from concurrent.futures import ThreadPoolExecutor


def foo(n):
    for i in range(n):
        yield i ** 2


with ThreadPoolExecutor(10) as exc:
    k = 8
    x = exc.map(list, exc.map(foo, range(k)))
    print([z for y in x for z in y])
# [0, 0, 1, 0, 1, 4, 0, 1, 4, 9, 0, 1, 4, 9, 16, 0, 1, 4, 9, 16, 25, 0, 1, 4, 9, 16, 25, 36]

[python：内映射函数的收益

问题描述投票：5回答：1

1个回答

一个更简单的例子

最新问题

[python：内映射函数的收益

问题描述 投票：5回答：1

1个回答

一个更简单的例子

最新问题

问题描述投票：5回答：1