我一直在努力让 CUDA 在装有 Windows 11 的新计算机上运行。在安装驱动程序、tensorflow、CUDA 和 cuDNN 后,我在尝试训练模型时遇到此错误。 我的版本是这些: CUDA:11.2,cuDNN:8.9,TensorFlow:2.10,python 3.10
起初我尝试用miniconda安装它,我以前没有这样做过,但它不起作用,所以我卸载了它并下载了python 3.10并再次尝试,将所有内容导入回来并重新安装CUDA和cuDNN。我还将 CUDA 添加到路径中:
C:\Program Files\NVIDIA GPU 计算工具包\CUDA 11.2 in C:\Program Files\NVIDIA GPU 计算工具包\CUDA 11.2\libnvvp
这是完整的日志:
2023-09-13 23:15:01.935221: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-13 23:15:02.279467: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9392 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4070 Ti, pci bus id: 0000:01:00.0, compute capability: 8.9
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (24, 64) 20736
batch_normalization (BatchN (24, 64) 256
ormalization)
dropout (Dropout) (24, 64) 0
dense (Dense) (24, 8) 520
batch_normalization_1 (Batc (24, 8) 32
hNormalization)
dropout_1 (Dropout) (24, 8) 0
dense_1 (Dense) (24, 1) 9
=================================================================
Total params: 21,553
Trainable params: 21,409
Non-trainable params: 144
_________________________________________________________________
2023-09-13 23:15:03.550772: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 4608000000 exceeds 10% of free system memory.
Epoch 1/10
2023-09-13 23:15:06.027973: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8905
Could not load symbol cublasGetSmCountTarget from cublas64_11.dll. Error code 127
2023-09-13 23:15:06.342783: I tensorflow/stream_executor/cuda/cuda_blas.cc:1614] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2023-09-13 23:15:06.385428: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x22167e1f550 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-09-13 23:15:06.385546: I tensorflow/compiler/xla/service/service.cc:181] StreamExecutor device (0): NVIDIA GeForce RTX 4070 Ti, Compute Capability 8.9
2023-09-13 23:15:06.389863: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-09-13 23:15:06.470150: F tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:453] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: ptxas exited with non-zero error code -1, output: ' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.
Process finished with exit code -1073740791 (0xC0000409)
这是我的代码:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.optimizers import AdamW
import tensorflow as tf
import data_formatting
import pandas as pd
from sklearn.preprocessing import Normalizer
data = 'data/'
X_train, y_train, X_cv, y_cv, X_test, y_test, init_bias, class_weight = data_formatting.transform(data=data)
output_bias = tf.keras.initializers.Constant(init_bias)
model1 = Sequential([
InputLayer(batch_input_shape=(24, 300, 16)),
LSTM(units=64, stateful=True),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.3),
Dense(units=8, activation='tanh', kernel_regularizer=tf.keras.regularizers.L2(0.16)),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.3),
Dense(units=1, activation='sigmoid', bias_initializer=output_bias)
])
model1.summary()
cp = ModelCheckpoint('ModelTest/', save_best_only=True)
model1.compile(loss=BinaryCrossentropy(), optimizer=AdamW(learning_rate=0.0001), metrics=['accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])
model1.fit(X_train, y_train, validation_data=(X_cv, y_cv), epochs=10, batch_size=24, callbacks=[cp], class_weight=class_weight)
当我用 ptxas 复制最终错误消息时,我点击了几乎每个链接,但似乎找不到解决方案。
我有这个确切的问题,但错误消息略有不同。但是,我只收到此消息,因为我正在使用带有 cuDNN 8.9.4 的tensorflow 2.15。
根据tensorflow网站上基于pip的分步安装指南的Windows-native选项卡,直到tensorflow 2.11仅需要cudatoolkit=11.2
和
cudnn=8.1.0
。我已经使用这些 cuda 和 cuDNN 版本在 Windows 上运行 tensorflow 2.10 和在 Linux 上运行 2.11,没有任何问题已经有一段时间了。我仅在最新版本的tensorflow需要最新版本的cuda和cudnn时遇到这些与PTX相关的问题。也许你可以尝试一下。
pip install --pre -U triton