Mediapipe 在图像文件路径和 numpy 数组输入两种情况下给出不同的结果

Question

如您所知，Mediapipe 根据对齐的输出图像而不是输入图像提供地标位置。

目标：我打算对多个图像执行“地标检测”。下面，我添加了使用 PoseLandmarkerOptions 来识别

33 个身体标志

的代码。找到这些地标后，我计划将面部角度分类为 0 度、90 度、180 度或270 度。

数据

：我包含了来自 MARS 数据集的示例图像，因为由于问题，我无法使用原始图像 - 与 MARS 数据集相比，它们具有更高的分辨率和尺寸。

所有图像均为压缩文件：

代码：我已经提供了检测图像中地标的主要代码。

import sys import cv2 import numpy as np import glob import os import base64 import mediapipe as mp from mediapipe.tasks import python from mediapipe.tasks.python import vision from typing import Dict base_options = python.BaseOptions( model_asset_path="./models/pose_landmarker.task", delegate=python.BaseOptions.Delegate.GPU, ) options = vision.PoseLandmarkerOptions( base_options=base_options, output_segmentation_masks=True, min_pose_detection_confidence=0.5, min_pose_presence_confidence=0.5, min_tracking_confidence=0.5, ) detector = vision.PoseLandmarker.create_from_options(options) def check_landmarks(detection_result, img, address): file_name = address.split("/")[-1] w, h, _ = img.shape for each_person_pose in detection_result.pose_landmarks: for each_key_point in each_person_pose: if each_key_point.presence > 0.5 and each_key_point.visibility > 0.5: x_px = int(each_key_point.x * h) y_px = int(each_key_point.y * w) cv2.circle(img, (x_px, y_px), 3, (255, 0, 0), 2) cv2.imwrite("./landmarks/" + file_name, img) def rectifier(detector, image, address): try: srgb_image = mp.Image.create_from_file(address) detection_result = detector.detect(srgb_image) check_landmarks(detection_result, srgb_image.numpy_view(), address) except Exception as e: print(f"error {e}") def rectify_image(rectify_image_request): image = cv2.imdecode( np.frombuffer(base64.b64decode(rectify_image_request["image"]), np.byte), cv2.IMREAD_COLOR, ) rectifier(detector, image, rectify_image_request["address"]) def read_image_for_rectify(address: str) -> Dict: face_object = dict() img = cv2.imread(address) _, buffer = cv2.imencode(".jpg", img) img = base64.b64encode(buffer).decode() face_object["image"] = img face_object["address"] = address return face_object folder_path = "./png2jpg" file_paths = glob.glob(os.path.join(folder_path, "*.jpg"), recursive=True) for id_file, file in enumerate(file_paths): print(id_file, file) rectify_image(read_image_for_rectify(file))

问题

：最初，我使用图像地址将图像直接传送到 Mediapipe，结果显示性能可接受。

但是，我现在需要以字典形式接收图像，其中图像以 base64

编码。我已相应地修改了数据输入，但在查看此场景中的输出后，Mediapipe 无法检测到许多图像中的地标。因此，我通过将这一行从

更改为numpy数组将图像输入到mediapipe中 srgb_image = mp.Image.create_from_file(address)

进入

srgb_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image)

第二种情况下的输出：

如何在两种场景下实现一致的输出？

感谢

Answer 1

的提示，并将这行代码添加到

read_image_for_rectify 函数中

img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

mediapipe 给出与第一种情况相同的结果。

def read_image_for_rectify(address: str) -> Dict:
    face_object = dict()
    img = cv2.imread(address)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    _, buffer = cv2.imencode(".jpg", img)
    img = base64.b64encode(buffer).decode()
    face_object["image"] = img
    face_object["address"] = address
    return face_object

但是还无法检测到某些图像的地标，例如

Mediapipe 在图像文件路径和 numpy 数组输入两种情况下给出不同的结果

问题描述投票：0回答：1

1个回答

最新问题

Mediapipe 在图像文件路径和 numpy 数组输入两种情况下给出不同的结果

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1