多 CPU、GPU 上的 Python 多处理

Question

我有 8 个 GPU，64 个 CPU 核心（multiprocessing.cpu_count()=64）

我正在尝试使用深度学习模型推断多个视频文件。我希望在 8 个 GPU 上分别处理一些文件。对于每个 GPU，我希望使用不同的 6 个 CPU 核心。

以下 python 文件名：

inference_{gpu_id}.py

Input1: GPU_id

Input2: Files to process for GPU_id

from torch.multiprocessing import Pool, Process, set_start_method
try:
     set_start_method('spawn', force=True)
except RuntimeError:
    pass

model = load_model(device='cuda:' + gpu_id) 

def pooling_func(file):
    preds = []
    cap = cv2.VideoCapture(file)
    while(cap.isOpened()):
          ret, frame = cap.read()
          count += 1
          if ret == True:
                frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
                pred = model(frame)[0]
                preds.append(pred)
          else:
                break
    cap.release()
    np.save(file[:-4]+'.npy', preds)

def process_files():

    # all files to process on gpu_id
    files = np.load(gpu_id + '_files.npy') 

    # I am hoping to use 6 cores for this gpu_id, 
    # and a different 6 cores for a different GPU id
    pool = Pool(6) 

    r = list(tqdm(pool.imap(pooling_func, files), total = len(files)))
    pool.close()
    pool.join()

if __name__ == '__main__':
    import multiprocessing
    multiprocessing.freeze_support()
    process_files()

我希望在所有 GPU 上同时运行 inference_{gpu_id}.py 文件

目前，我能够在一个 GPU、6 个核心上成功运行它，但是当我尝试在所有 GPU 上一起运行它时，只有 GPU 0 运行，所有其他 GPU 都停止给出以下错误消息。

RuntimeError: CUDA error: invalid device ordinal.

我正在运行的脚本：

CUDA_VISIBLE_DEVICES=0 inference_0.py

CUDA_VISIBLE_DEVICES=1 inference_1.py

...

CUDA_VISIBLE_DEVICES=7 inference_7.py

Answer 1

考虑一下，如果您不使用

CUDA_VISIBLE_DEVICES

标志，那么所有 GPU 将可用于您的 PyTorch 进程。这意味着

torch.cuda.device_count

将返回 8（假设您的版本设置有效）。您将能够通过

torch.device

、

torch.device('cuda:0')

、

torch.device('cuda:1')

、... 和

torch.device('cuda:8')

访问这 8 个 GPU 中的每一个。

现在，如果您只计划使用一个，并且希望将您的流程限制为一个。那么

CUDA_VISIBLE_DEVICES=i

（其中

是设备序号）将使之如此。在这种情况下，

torch.cuda

只能通过

torch.device('cuda:0')

访问单个设备。实际设备序号是什么并不重要，您访问它的方式是通过

torch.device('cuda:0')

。

如果您允许访问多个设备：假设 n°0、n°4 和 n°2，那么您将使用

CUDA_VISIBLE_DEVICES=0,4,2

。因此，您可以通过

d0 = torch.device('cuda:0')

、

d1 = torch.device('cuda:1')

和

d2 = torch.device('cuda:2')

引用您的 cuda 设备。按照您使用标志定义它们的顺序，i.e.:

d0
-> GPU n°0、
d1
-> GPU n°4 和
d2
-> GPU n°2。

这使得您可以使用相同的代码并在不同的 GPU 上运行它而无需更改底层代码您所指的设备序号。

总而言之，您需要查看的是运行代码所需的设备数量。对于你的情况：

就足够了。您将用

torch.device('cuda:0')

来引用它。但是，在运行代码时，您需要使用以下标志来指定

cuda:0

设备是什么：

> CUDA_VISIBLE_DEVICES=0 inference.py
> CUDA_VISIBLE_DEVICES=1 inference.py
  ...
> CUDA_VISIBLE_DEVICES=7 inference.py

请注意

'cuda'

将默认为

'cuda:0'

。

多 CPU、GPU 上的 Python 多处理

问题描述投票：0回答：1

1个回答

最新问题

多 CPU、GPU 上的 Python 多处理

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1