在新计算机上安装后，CUDA 无法与 TensorFlow 一起使用“ptxas 在将 ptx 编译为 sass 期间返回错误”

Question

我一直在努力让 CUDA 在装有 Windows 11 的新计算机上运行。在安装驱动程序、tensorflow、CUDA 和 cuDNN 后，我在尝试训练模型时遇到此错误。我的版本是这些： CUDA：11.2，cuDNN：8.9，TensorFlow：2.10，python 3.10

起初我尝试用miniconda安装它，我以前没有这样做过，但它不起作用，所以我卸载了它并下载了python 3.10并再次尝试，将所有内容导入回来并重新安装CUDA和cuDNN。我还将 CUDA 添加到路径中：

C:\Program Files\NVIDIA GPU 计算工具包\CUDA 11.2 in C:\Program Files\NVIDIA GPU 计算工具包\CUDA 11.2\libnvvp

这是完整的日志：

2023-09-13 23:15:01.935221: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-13 23:15:02.279467: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9392 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4070 Ti, pci bus id: 0000:01:00.0, compute capability: 8.9
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm (LSTM)                 (24, 64)                  20736     
                                                                 
 batch_normalization (BatchN  (24, 64)                 256       
 ormalization)                                                   
                                                                 
 dropout (Dropout)           (24, 64)                  0         
                                                                 
 dense (Dense)               (24, 8)                   520       
                                                                 
 batch_normalization_1 (Batc  (24, 8)                  32        
 hNormalization)                                                 
                                                                 
 dropout_1 (Dropout)         (24, 8)                   0         
                                                                 
 dense_1 (Dense)             (24, 1)                   9         
                                                                 
=================================================================
Total params: 21,553
Trainable params: 21,409
Non-trainable params: 144
_________________________________________________________________
2023-09-13 23:15:03.550772: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 4608000000 exceeds 10% of free system memory.
Epoch 1/10
2023-09-13 23:15:06.027973: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8905
Could not load symbol cublasGetSmCountTarget from cublas64_11.dll. Error code 127
2023-09-13 23:15:06.342783: I tensorflow/stream_executor/cuda/cuda_blas.cc:1614] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2023-09-13 23:15:06.385428: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x22167e1f550 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-09-13 23:15:06.385546: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (0): NVIDIA GeForce RTX 4070 Ti, Compute Capability 8.9
2023-09-13 23:15:06.389863: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-09-13 23:15:06.470150: F tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:453] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: ptxas exited with non-zero error code -1, output: '  If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.



Process finished with exit code -1073740791 (0xC0000409)

这是我的代码：

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.optimizers import AdamW
import tensorflow as tf
import data_formatting
import pandas as pd
from sklearn.preprocessing import Normalizer



data = 'data/'
X_train, y_train, X_cv, y_cv, X_test, y_test, init_bias, class_weight = data_formatting.transform(data=data)
output_bias = tf.keras.initializers.Constant(init_bias)
model1 = Sequential([
    InputLayer(batch_input_shape=(24, 300, 16)),
    LSTM(units=64, stateful=True),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dropout(0.3),
    Dense(units=8, activation='tanh', kernel_regularizer=tf.keras.regularizers.L2(0.16)),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dropout(0.3),
    Dense(units=1, activation='sigmoid', bias_initializer=output_bias)
])

model1.summary()

cp = ModelCheckpoint('ModelTest/', save_best_only=True)


model1.compile(loss=BinaryCrossentropy(), optimizer=AdamW(learning_rate=0.0001), metrics=['accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])

model1.fit(X_train, y_train, validation_data=(X_cv, y_cv), epochs=10, batch_size=24, callbacks=[cp], class_weight=class_weight)

当我用 ptxas 复制最终错误消息时，我点击了几乎每个链接，但似乎找不到解决方案。

Answer 1

我有这个确切的问题，但错误消息略有不同。但是，我只收到此消息，因为我正在使用带有 cuDNN 8.9.4 的tensorflow 2.15。

根据tensorflow网站上基于pip的分步安装指南的Windows-native选项卡，直到tensorflow 2.11仅需要cudatoolkit=11.2

和

cudnn=8.1.0

。我已经使用这些 cuda 和 cuDNN 版本在 Windows 上运行 tensorflow 2.10 和在 Linux 上运行 2.11，没有任何问题已经有一段时间了。我仅在最新版本的tensorflow需要最新版本的cuda和cudnn时遇到这些与PTX相关的问题。也许你可以尝试一下。

Answer 2

0
投票

尝试这样做

pip install --pre -U triton

在新计算机上安装后，CUDA 无法与 TensorFlow 一起使用“ptxas 在将 ptx 编译为 sass 期间返回错误”

问题描述投票：0回答：2

2个回答

最新问题

在新计算机上安装后，CUDA 无法与 TensorFlow 一起使用“ptxas 在将 ptx 编译为 sass 期间返回错误”

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2