Mediapipe GPU 使用情况

Question

我使用的是带有RTX3060和python 3.10环境的PC。有什么方法可以验证当我使用 mediapipe 处理帧时，它将在 GPU 上完成吗？如果当前不在 GPU 上运行，该怎么办？

我在python中安装cuda库

Answer 1

由于您没有指定 Mediapipe 的版本，我将使用 Python 3.10 呈现 Mediapipe 版本 0.10.18 的结果。

使用的CPU是Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz,

GPU 是 1080 Ti

以下是 CUDA 的规格：

nvcc: NVIDIA (R) Cuda compiler driver  
Copyright (c) 2005-2021 NVIDIA Corporation  
Built on Sun_Feb_14_21:12:58_PST_2021  
Cuda compilation tools, release 11.2, V11.2.152  
Build cuda_11.2.r11.2/compiler.29618528_0

我在两种环境下进行了测试：CPU 和 GPU。下面，我提供了基于时间的性能结果。

用于GPU

import cv2
import numpy as np
import mediapipe as mp
from mediapipe import solutions
from mediapipe.framework.formats import landmark_pb2
from mediapipe.tasks import python
from mediapipe.tasks.python import vision


def draw_landmarks_on_image(rgb_image, detection_result):
    pose_landmarks_list = detection_result.pose_landmarks
    annotated_image = np.copy(rgb_image)

    # Loop through the detected poses to visualize.
    for idx in range(len(pose_landmarks_list)):
        pose_landmarks = pose_landmarks_list[idx]

        # Draw the pose landmarks.
        pose_landmarks_proto = landmark_pb2.NormalizedLandmarkList()
        pose_landmarks_proto.landmark.extend(
            [
                landmark_pb2.NormalizedLandmark(
                    x=landmark.x, y=landmark.y, z=landmark.z
                )
                for landmark in pose_landmarks
            ]
        )
        solutions.drawing_utils.draw_landmarks(
            annotated_image,
            pose_landmarks_proto,
            solutions.pose.POSE_CONNECTIONS,
            # solutions.drawing_styles.get_default_pose_landmarks_style(),
            solutions.drawing_utils.DrawingSpec(
                color=(255, 0, 255), thickness=5, circle_radius=10
            ),
            solutions.drawing_utils.DrawingSpec(
                color=(0, 255, 255), thickness=10, circle_radius=10
            ),
        )
    return annotated_image


image_path = "DSC_6816.JPG"


img = cv2.imread(image_path)


# STEP 1: Import the necessary modules.

# STEP 2: Create an PoseLandmarker object.
base_options = python.BaseOptions(
    model_asset_path="pose_landmarker.task",
    delegate=python.BaseOptions.Delegate.GPU
) 
options = vision.PoseLandmarkerOptions(
    base_options=base_options, output_segmentation_masks=True
)
detector = vision.PoseLandmarker.create_from_options(options)

# STEP 3: Load the input image.
image = mp.Image.create_from_file(image_path)

# STEP 4: Detect pose landmarks from the input image.
import time

start_time = time.time()
for i in range(100):
    detection_result = detector.detect(image)
end_time = time.time()
print(f"average_time = {(end_time - start_time)/1000}")

for each_person_pose in detection_result.pose_landmarks:
    for each_key_point in each_person_pose:
        print(
            each_key_point.x,
            each_key_point.y,
            each_key_point.z,
            each_key_point.presence,
            each_key_point.visibility,
        )


# STEP 5: Process the detection result. In this case, visualize it.
annotated_image = draw_landmarks_on_image(image.numpy_view(), detection_result)
cv2.imwrite("landmarks.jpg", cv2.cvtColor(annotated_image, cv2.COLOR_RGB2BGR))


segmentation_mask = detection_result.segmentation_masks[0].numpy_view()
visualized_mask = np.repeat(segmentation_mask[:, :, np.newaxis], 3, axis=2) * 255
cv2.imwrite("sementation.jpg", visualized_mask)

输出消息表明它将在 GPU 上运行：INFO: Created TensorFlow Lite delegate for GPU。

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1732002754.289760     666 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1732002754.323350     724 gl_context.cc:357] GL version: 3.2 (OpenGL ES 3.2 NVIDIA 470.182.03), renderer: NVIDIA GeForce GTX 1080 Ti/PCIe/SSE2
INFO: Created TensorFlow Lite delegate for GPU.
E0000 00:00:1732002755.381976     724 tensor.cc:410] Tensors are designed for single writes. Multiple writes to a Tensor instance are not supported and may lead to undefined behavior due to lack of synchronization.
W0000 00:00:1732002755.430336     727 landmark_projection_calculator.cc:186] Using NORM_RECT without IMAGE_DIMENSIONS is only supported for the square ROI. Provide IMAGE_DIMENSIONS or use PROJECTION_MATRIX.
E0000 00:00:1732002755.554431     724 tensor.cc:410] Tensors are designed for single writes. Multiple writes to a Tensor instance are not supported and may lead to undefined behavior due to lack of synchronization.

此外，我还截取了 GPU 使用情况的屏幕截图。

用于CPU

只需更改这部分代码即可：

base_options = python.BaseOptions(
    model_asset_path="pose_landmarker.task"
)

输出消息表明它将在 CPU 上运行：INFO: Created TensorFlow Lite XNNPACK delegate for CPU。


WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1732002694.303737   30860 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1732002694.325575   30918 gl_context.cc:357] GL version: 3.2 (OpenGL ES 3.2 NVIDIA 470.182.03), renderer: NVIDIA GeForce GTX 1080 Ti/PCIe/SSE2
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
W0000 00:00:1732002694.430793   30921 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1732002694.557819   30919 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1732002695.018868   30925 landmark_projection_calculator.cc:186] Using NORM_RECT without IMAGE_DIMENSIONS is only supported for the square ROI. Provide IMAGE_DIMENSIONS or use PROJECTION_MATRIX.

最后，时间安排：

average time CPU per image: 0.019065617084503175
average time GPU per image: 0.023649553298950195

CPU 比 GPU 快

1.24

Mediapipe GPU 使用情况

问题描述投票：0回答：1

1个回答

最新问题

Mediapipe GPU 使用情况

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1