Ollama,我怎样才能使用我拥有的所有 GPU?

问题描述 投票:0回答:1

我在 4xA100 GPU 服务器上运行 Ollma,但看起来只有 1 个 GPU 用于

LLaMa3:7b
模型。 如何同时使用所有 4 个 GPU? 我没有使用
docker
,只使用
ollama serve
ollama run

或者有没有办法同时运行 4 个服务器进程(每个进程在不同的端口上)以进行大型批处理?

Wed May 15 01:24:29 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100 80GB PCIe          On  | 00000000:17:00.0 Off |                    0 |
| N/A   63C    P0             293W / 300W |  39269MiB / 81920MiB |     88%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100 80GB PCIe          On  | 00000000:65:00.0 Off |                    0 |
| N/A   28C    P0              51W / 300W |      7MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100 80GB PCIe          On  | 00000000:CA:00.0 Off |                    0 |
| N/A   28C    P0              51W / 300W |      7MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100 80GB PCIe          On  | 00000000:E3:00.0 Off |                    0 |
| N/A   29C    P0              52W / 300W |      7MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   3420401      C   ...unners/cuda_v11/ollama_llama_server    39256MiB |
+---------------------------------------------------------------------------------------+
parallel-processing gpu nvidia large-language-model ollama
1个回答
0
投票

刚刚找到使用单独端口运行多个实例的方法。

OLLAMA_HOST=135.197.255.43:11432 ./ollama serve
OLLAMA_HOST=135.197.255.43:11433 ./ollama serve
OLLAMA_HOST=135.197.255.43:11434 ./ollama serve
OLLAMA_HOST=135.197.255.43:11435 ./ollama serve
© www.soinside.com 2019 - 2024. All rights reserved.