使用多处理和 time.strftime() 创建绘图无法正常工作

Question

我正在尝试使用多处理并行运行我的脚本来创建绘图。我在这里为我的问题创建了 2 个示例脚本，因为带有计算部分的实际主脚本太长了。在 script0.py 中，您可以看到多处理部分，其中我启动了实际的 script1.py，该部分并行执行 4 次。在此示例中，它只是创建一些随机散点图。

script0.py：

import multiprocessing as mp
import os

def execute(process):
    os.system(f"python {process}")



if __name__ == "__main__":

    proc_num = 4
    process= []

    for _ in range(proc_num):
        process.append("script1.py")

    process_pool = mp.Pool(processes= proc_num)
    process_pool.map(execute, process)

script1.py：

#just a random scatterplot, but works for my example
    import time
    import numpy as np
    import matplotlib.pyplot as plt
    import os
    
    dir_name = "stackoverflow_question"
    plot_name = time.strftime("Plot %Hh%Mm%Ss")      #note the time.strftime() function
    
    if not os.path.exists(f"{dir_name}"):
        os.mkdir(f"{dir_name}")
    
    N = 50
    x = np.random.rand(N)
    y = np.random.rand(N)
    colors = np.random.rand(N)
    
    area = (30 * np.random.rand(N))**2
    
    plt.scatter(x,y, s=area, c=colors, alpha=0.5)
    #plt.show()
    plt.savefig(f"{dir_name}/{plot_name}", dpi = 300)

重要的是，我按照创作时间来命名情节

plot_name = time.strftime("绘制%Hh%Mm%Ss")

因此这会创建一个类似“Plot 16h39m22s”的字符串。到目前为止一切顺利...现在解决我的实际问题！我意识到，当并行启动进程时，有时绘图名称是相同的，因为 time.strftime() 创建的时间戳是相同的，因此 script1.py 的一个实例可能会覆盖另一个已创建的绘图.

在我的工作脚本中，我遇到了这个确切的问题，我生成了大量数据，因此我需要根据生成的日期和时间来命名我的图和 CSV。

我已经考虑过在调用 script1.py 时为其提供一个变量，但我不知道如何实现这一点，因为我刚刚了解了多处理库。但这个变量也必须有所不同，否则我会遇到同样的问题。

有人对我如何实现这一点有更好的想法吗？提前非常感谢您。

Answer 1

我建议这些方法：

方法一：（简单推荐）如果可以改名字，我推荐使用unixtime（例如使用time.time()或time.time_ns()）而不是日期或在秒上添加小数。这样你就会几乎不可能发生碰撞。
方法 2：在文件名中添加 process id（例如：）。这样即使两个进程都写同时你将拥有区分进程的进程ID 文件。如果你想从名字末尾删除id 如果存在冲突，执行读取文件名并进行合并以适当的方式调整文件名。
方法 3：与方法 2 类似，但不更改名称创建一个以进程 ID 命名的文件夹，将该过程的输出。在执行结束时，您合并文件夹并纠正任何冲突。
方式四：（不推荐，难以管理且影响性能）共享内存。您在共享内存中使用变量与最后的时间戳并检查。

Answer 2

一些想法...

首先，您没有遵循

multiprocessing

模块中有关如何使用

Pool

的指南。你应该把它放在上下文管理器中，

with(...)...

有很多例子。请参阅 dox 中的红色警告：

https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool

此外，使用

os.system

调用有点奇怪/不安全。为什么不将绘图例程放入同一模块或不同模块中的标准函数中，然后导入它呢？这将允许您向函数传递附加信息（例如好的标签）。我期望这样的东西，其中

source

是数据文件或外部源......

def make_plot(source, output_file_name, plot_label):
    # read the data source
    # make the plot
    # save it to the output path...

就标签而言，如果您在同一“秒”内启动这些进程，当然会出现重叠，因此您可以在标签上附加进程号，或者附加一些其他信息，例如来自数据源，或使用相同的时间戳，但将输出放在唯一的文件夹中，如其他答案中的建议。

我会想这样的事情......

代码：

from multiprocessing import Pool
import time

def f(data, output_folder, label):
    # here data is just an integer, in yours, it would be the source of the graph data...
    val = data * data
    # the below is just example...  you could just use your folder making/saving routine...
    return f'now we can save {label} in folder {output_folder} with value: {val}'

if __name__ == '__main__':
    with Pool(5) as p:
        folders = ['data1', 'data2', 'data3']
        labels = [time.strftime("Plot %Hh%Mm%Ss")]*3
        x_s = [1, 2, 3]
        output = p.starmap(f, zip(x_s, folders, labels))
        for result in output:
            print(result)

输出：

now we can save Plot 08h55m17s in folder data1 with value: 1
now we can save Plot 08h55m17s in folder data2 with value: 4
now we can save Plot 08h55m17s in folder data3 with value: 9

使用多处理和 time.strftime() 创建绘图无法正常工作

问题描述投票：0回答：2

2个回答

代码：

输出：

最新问题

使用多处理和 time.strftime() 创建绘图无法正常工作

问题描述 投票：0回答：2

2个回答

代码：

输出：

最新问题

问题描述投票：0回答：2