同步停止事件失败:CUDA_ERROR_LAUNCH_FAILED:未指定的启动失败

问题描述 投票:0回答:1

我正在尝试使用 keras 训练一组框架来检测异常。当batch_size = 1时它可以工作,但是当我将其更改为30时,我收到错误,当我将batch_size返回到1时,它不再工作。

  • 操作系统:Windows 10
  • keras:2.2.4
  • 张量流GPU:1.12.0
  • CUDA:9.0
  • cuDNN:7.1
  • Aanaconda环境:python:3.6.8 GPU型号和显存:GTX 1050

我的代码:

from keras.callbacks import ModelCheckpoint, EarlyStopping
from model import load_model
import numpy as np 
import argparse

parser=argparse.ArgumentParser()
parser.add_argument('n_epochs',type=int)

args=parser.parse_args()

X_train=np.load('training.npy')
frames=X_train.shape[2]
#Need to make number of frames divisible by 10

frames=frames-frames%10

X_train=X_train[:,:,:frames]
X_train=X_train.reshape(-1,227,227,10)
X_train=np.expand_dims(X_train,axis=4)
Y_train=X_train.copy()

epochs=args.n_epochs
batch_size=1

if __name__=="__main__":

    model=load_model()

    callback_save = ModelCheckpoint("model.h5",
                                    monitor="mean_squared_error", save_best_only=True)

    callback_early_stopping = EarlyStopping(monitor='val_loss', patience=3)

    print('Model has been loaded')

    model.fit(X_train,Y_train,
              batch_size=batch_size,
              epochs=epochs,
              callbacks = [callback_save,callback_early_stopping]
              )

我收到此错误:

Using TensorFlow backend.

2019-02-04 08:37:27.473383: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2

2019-02-04 08:37:28.532133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
totalMemory: 4.00GiB freeMemory: 3.30GiB

2019-02-04 08:37:28.546223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0

2019-02-04 08:37:29.561369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:

2019-02-04 08:37:29.566360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0

2019-02-04 08:37:29.570356: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N

2019-02-04 08:37:29.581381: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3013 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)

WARNING:tensorflow:From C:\Users\alaaa\Anaconda31\envs\python-3.6\lib\site-packages\keras\backend\tensorflow_backend.py:1188: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.  
Instructions for updating: keep_dims is deprecated, use keepdims instead  

WARNING:tensorflow:From C:\Users\alaaa\Anaconda31\envs\python-3.6\lib\site-packages\keras\backend\tensorflow_backend.py:1290: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.

Model has been loaded
Epoch 1/1

2019-02-04 08:37:40.836650: E tensorflow/stream_executor/cuda/cuda_driver.cc:981] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure

2019-02-04 08:37:40.846484: E tensorflow/stream_executor/cuda/cuda_timer.cc:55] Internal: error destroying CUDA event in context 0000020EB9E24AA0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure

2019-02-04 08:37:40.861133: E tensorflow/stream_executor/cuda/cuda_timer.cc:60] Internal: error destroying CUDA event in context 0000020EB9E24AA0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure

2019-02-04 08:37:40.893425: F tensorflow/stream_executor/cuda/cuda_dnn.cc:231] Check failed: status == CUDNN_STATUS_SUCCESS (7 vs. 0)Failed to set cuDNN stream.
python-3.x tensorflow keras
1个回答
0
投票

这个问题解决了吗?我也希望能够得到这个问题的解决方案。非常感谢!

© www.soinside.com 2019 - 2024. All rights reserved.