我遇到了多处理模块的奇怪行为。谁能解释一下这是怎么回事吗?
以下 MWE 停止(永远运行而不会出现错误):
#!/usr/bin/env python3
import multiprocessing
import numpy as np
from skimage import io
from sklearn.cluster import KMeans
def create_model():
sampled_pixels = np.random.randint(0, 255, (800,3))
kmeans_model = KMeans(n_clusters=8, random_state=0).fit(sampled_pixels)
def process_image(test, test2):
image = np.random.randint(0, 255, (800,3))
kmeans_model = KMeans(n_clusters=8, random_state=0).fit(image)
image = kmeans_model.predict(image)
def main():
create_model()
with multiprocessing.Pool(1) as pool:
pool.apply_async(process_image, args=('test', 'test'))
pool.close()
pool.join()
if __name__ == "__main__":
main()
但是,如果我删除该行
create_model()
或更改
def process_image(test, test2)
# as well as
pool.apply_async(process_image, args=('test', 'test'))
到
def process_image(test)`
# and
pool.apply_async(process_image, args=('test'))
代码运行成功,因为它应该成功,因为参数和函数调用
create_model()
是完全多余的。
附录
> pip list
Package Version
------------- ---------
imageio 2.34.0
joblib 1.4.0
lazy_loader 0.4
networkx 3.3
numpy 1.26.4
packaging 24.0
pillow 10.3.0
pip 23.2.1
scikit-image 0.23.1
scikit-learn 1.4.2
scipy 1.13.0
threadpoolctl 3.4.0
tifffile 2024.2.12
> python --version
Python 3.12.2
我认为您在
multiprocessing
模块中遇到的奇怪行为是由于 Python 如何处理由 multiprocessing.Pool
创建的子进程中的对象引用的一个微妙问题造成的。
修改
create_model()
返回创建的kmeans_model
:
def create_model():
sampled_pixels = np.random.randint(0, 255, (800,3))
kmeans_model = KMeans(n_clusters=8, random_state=0).fit(sampled_pixels)
return kmeans_model
然后,在
main()
中,使用process_image
中返回的模型:
kmeans_model = create_model()
with multiprocessing.Pool(1) as pool:
pool.apply_async(process_image, args=(kmeans_model,))