我使用的是带有RTX3060和python 3.10环境的PC。有什么方法可以验证当我使用 mediapipe 处理帧时,它将在 GPU 上完成吗?如果当前不在 GPU 上运行,该怎么办?
我在python中安装cuda库
由于您没有指定 Mediapipe 的版本,我将使用 Python 3.10 呈现 Mediapipe 版本 0.10.18 的结果。
使用的CPU是Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz,
GPU 是 1080 Ti
以下是 CUDA 的规格:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
我在两种环境下进行了测试:CPU 和 GPU。下面,我提供了基于时间的性能结果。
用于GPU
import cv2
import numpy as np
import mediapipe as mp
from mediapipe import solutions
from mediapipe.framework.formats import landmark_pb2
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
def draw_landmarks_on_image(rgb_image, detection_result):
pose_landmarks_list = detection_result.pose_landmarks
annotated_image = np.copy(rgb_image)
# Loop through the detected poses to visualize.
for idx in range(len(pose_landmarks_list)):
pose_landmarks = pose_landmarks_list[idx]
# Draw the pose landmarks.
pose_landmarks_proto = landmark_pb2.NormalizedLandmarkList()
pose_landmarks_proto.landmark.extend(
[
landmark_pb2.NormalizedLandmark(
x=landmark.x, y=landmark.y, z=landmark.z
)
for landmark in pose_landmarks
]
)
solutions.drawing_utils.draw_landmarks(
annotated_image,
pose_landmarks_proto,
solutions.pose.POSE_CONNECTIONS,
# solutions.drawing_styles.get_default_pose_landmarks_style(),
solutions.drawing_utils.DrawingSpec(
color=(255, 0, 255), thickness=5, circle_radius=10
),
solutions.drawing_utils.DrawingSpec(
color=(0, 255, 255), thickness=10, circle_radius=10
),
)
return annotated_image
image_path = "DSC_6816.JPG"
img = cv2.imread(image_path)
# STEP 1: Import the necessary modules.
# STEP 2: Create an PoseLandmarker object.
base_options = python.BaseOptions(
model_asset_path="pose_landmarker.task",
delegate=python.BaseOptions.Delegate.GPU
)
options = vision.PoseLandmarkerOptions(
base_options=base_options, output_segmentation_masks=True
)
detector = vision.PoseLandmarker.create_from_options(options)
# STEP 3: Load the input image.
image = mp.Image.create_from_file(image_path)
# STEP 4: Detect pose landmarks from the input image.
import time
start_time = time.time()
for i in range(100):
detection_result = detector.detect(image)
end_time = time.time()
print(f"average_time = {(end_time - start_time)/1000}")
for each_person_pose in detection_result.pose_landmarks:
for each_key_point in each_person_pose:
print(
each_key_point.x,
each_key_point.y,
each_key_point.z,
each_key_point.presence,
each_key_point.visibility,
)
# STEP 5: Process the detection result. In this case, visualize it.
annotated_image = draw_landmarks_on_image(image.numpy_view(), detection_result)
cv2.imwrite("landmarks.jpg", cv2.cvtColor(annotated_image, cv2.COLOR_RGB2BGR))
segmentation_mask = detection_result.segmentation_masks[0].numpy_view()
visualized_mask = np.repeat(segmentation_mask[:, :, np.newaxis], 3, axis=2) * 255
cv2.imwrite("sementation.jpg", visualized_mask)
输出消息表明它将在 GPU 上运行:INFO: Created TensorFlow Lite delegate for GPU。
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1732002754.289760 666 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1732002754.323350 724 gl_context.cc:357] GL version: 3.2 (OpenGL ES 3.2 NVIDIA 470.182.03), renderer: NVIDIA GeForce GTX 1080 Ti/PCIe/SSE2
INFO: Created TensorFlow Lite delegate for GPU.
E0000 00:00:1732002755.381976 724 tensor.cc:410] Tensors are designed for single writes. Multiple writes to a Tensor instance are not supported and may lead to undefined behavior due to lack of synchronization.
W0000 00:00:1732002755.430336 727 landmark_projection_calculator.cc:186] Using NORM_RECT without IMAGE_DIMENSIONS is only supported for the square ROI. Provide IMAGE_DIMENSIONS or use PROJECTION_MATRIX.
E0000 00:00:1732002755.554431 724 tensor.cc:410] Tensors are designed for single writes. Multiple writes to a Tensor instance are not supported and may lead to undefined behavior due to lack of synchronization.
此外,我还截取了 GPU 使用情况的屏幕截图。
用于CPU
只需更改这部分代码即可:
base_options = python.BaseOptions(
model_asset_path="pose_landmarker.task"
)
输出消息表明它将在 CPU 上运行:INFO: Created TensorFlow Lite XNNPACK delegate for CPU。
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1732002694.303737 30860 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1732002694.325575 30918 gl_context.cc:357] GL version: 3.2 (OpenGL ES 3.2 NVIDIA 470.182.03), renderer: NVIDIA GeForce GTX 1080 Ti/PCIe/SSE2
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
W0000 00:00:1732002694.430793 30921 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1732002694.557819 30919 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1732002695.018868 30925 landmark_projection_calculator.cc:186] Using NORM_RECT without IMAGE_DIMENSIONS is only supported for the square ROI. Provide IMAGE_DIMENSIONS or use PROJECTION_MATRIX.
最后,时间安排:
average time CPU per image: 0.019065617084503175
average time GPU per image: 0.023649553298950195
CPU 比 GPU 快
1.24