我正在使用 Jetpack 4.4 的 nVidia Jetson Nano 上运行一个我认为相当小的 CNN。 nVidia 声称 Nano 可以以 36 fps 的速度运行 ResNet-50,因此我希望我的小得多的网络能够轻松以 30+ fps 的速度运行。
实际上,每次前传需要 160-180 毫秒,所以我最多只能获得 5-6 fps。我的CNN:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lambda (Lambda) (None, 210, 848, 3) 0
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 210, 282, 3) 0
_________________________________________________________________
conv2d (Conv2D) (None, 102, 138, 16) 2368
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 51, 69, 16) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 24, 33, 32) 12832
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 12, 16, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 4, 6, 64) 51264
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 2, 3, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 384) 0
_________________________________________________________________
dropout (Dropout) (None, 384) 0
_________________________________________________________________
dense (Dense) (None, 64) 24640
_________________________________________________________________
dropout_1 (Dropout) (None, 64) 0
_________________________________________________________________
elu (ELU) (None, 64) 0
_________________________________________________________________
dense_1 (Dense) (None, 1) 65
=================================================================
Total params: 91,169
Trainable params: 91,169
Non-trainable params: 0
_________________________________________________________________
代码:
import numpy as np
import cv2
import time
import tensorflow as tf
from tensorflow import keras
model_name = 'v9_small_FC_epoch_3'
loaded_model = keras.models.load_model('/home/jetson/notebooks/trained_models/' + model_name + '.h5')
loaded_model.summary()
frame = cv2.imread('/home/jetson/notebooks/frame1.jpg')
test_data = np.expand_dims(frame, axis=0)
for i in range(10):
start = time.time()
predictions = loaded_model.predict(test_data)
print(predictions[0][0])
end = time.time()
print("Inference took {}s".format(end-start))
结果:
4.7763316333293915
Inference took 10.111131191253662s
4.7763316333293915
Inference took 0.1822071075439453s
4.7763316333293915
Inference took 0.17330455780029297s
4.7763316333293915
Inference took 0.18085694313049316s
4.7763316333293915
Inference took 0.16646790504455566s
4.7763316333293915
Inference took 0.1703803539276123s
4.7763316333293915
Inference took 0.1788337230682373s
4.7763316333293915
Inference took 0.17131853103637695s
4.7763316333293915
Inference took 0.1660606861114502s
4.7763316333293915
Inference took 0.18377089500427246s
减少 CNN 的层数。另一种解决方案可能是禁用 jetson 的 GUI 界面以释放 CNN 的 RAM。
在实时中,您无法使用此模型,因为它的推理延迟