Deeplearning4J CUDA内存问题

问题描述 投票:0回答:1

我正在尝试在gpu服务器上训练MultiLayerNetwork模型,并在调用model.fit()之后多次迭代遇到内存分配问题。我已经阅读了内存管理文档,并在训练模式下启用了工作区,并且已将批处理大小减小为32,但是似乎仍然出现错误。

系统详细信息:

NVIDIA-SMI 410.72驱动程序版本:410.72 CUDA版本:10.0

03:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)
04:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)
82:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)
83:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1

Java选择:-Xms1G -Xmx2G -Dorg.bytedeco.javacpp.maxbytes=14G -Dorg.bytedeco.javacpp.maxphysicalbytes=14G

Maven CUDA驱动程序:

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-cuda-10.0</artifactId>
    <version>1.0.0-beta4</version>
</dependency>
<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-10.0-platform</artifactId>
    <version>1.0.0-beta4</version>
</dependency>

错误:

java.lang.OutOfMemoryError: Cannot allocate new PointerPointer(4): totalBytes = 227M, physicalBytes = 14558M
    at org.bytedeco.javacpp.PointerPointer.<init> (PointerPointer.java:126)
    at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.calculateOutputShape 
    ...
Caused by: java.lang.OutOfMemoryError: Physical memory usage is too high: physicalBytes (14558M) > maxPhysicalBytes (14336M)
    at org.bytedeco.javacpp.Pointer.deallocator (Pointer.java:589)
    at org.bytedeco.javacpp.Pointer.init (Pointer.java:125)
    at org.bytedeco.javacpp.PointerPointer.allocateArray (Native Method)
    at org.bytedeco.javacpp.PointerPointer.<init> (PointerPointer.java:118)
    at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.calculateOutputShape 
    ...
java cudnn deeplearning4j nd4j
1个回答
0
投票

您应该做的第一件事是更新到当前版本1.0.0-beta6。

然后,您应该查看模型处理更大的批次实际需要多少内存。

© www.soinside.com 2019 - 2024. All rights reserved.