如您所知,Mediapipe 根据对齐的输出图像而不是输入图像提供地标位置。
目标:
我打算对多个图像执行“地标检测”。下面,我添加了使用 PoseLandmarkerOptions
来识别
33 个身体标志的代码。找到这些地标后,我计划将面部角度分类为 0 度、90 度、180 度或270 度。 数据
: 我包含了来自 MARS 数据集的示例图像,因为由于问题,我无法使用原始图像 - 与 MARS 数据集相比,它们具有更高的分辨率和尺寸。
代码: 我已经提供了检测图像中地标的主要代码。
import sys
import cv2
import numpy as np
import glob
import os
import base64
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
from typing import Dict
base_options = python.BaseOptions(
model_asset_path="./models/pose_landmarker.task",
delegate=python.BaseOptions.Delegate.GPU,
)
options = vision.PoseLandmarkerOptions(
base_options=base_options,
output_segmentation_masks=True,
min_pose_detection_confidence=0.5,
min_pose_presence_confidence=0.5,
min_tracking_confidence=0.5,
)
detector = vision.PoseLandmarker.create_from_options(options)
def check_landmarks(detection_result, img, address):
file_name = address.split("/")[-1]
w, h, _ = img.shape
for each_person_pose in detection_result.pose_landmarks:
for each_key_point in each_person_pose:
if each_key_point.presence > 0.5 and each_key_point.visibility > 0.5:
x_px = int(each_key_point.x * h)
y_px = int(each_key_point.y * w)
cv2.circle(img, (x_px, y_px), 3, (255, 0, 0), 2)
cv2.imwrite("./landmarks/" + file_name, img)
def rectifier(detector, image, address):
try:
srgb_image = mp.Image.create_from_file(address)
detection_result = detector.detect(srgb_image)
check_landmarks(detection_result, srgb_image.numpy_view(), address)
except Exception as e:
print(f"error {e}")
def rectify_image(rectify_image_request):
image = cv2.imdecode(
np.frombuffer(base64.b64decode(rectify_image_request["image"]), np.byte),
cv2.IMREAD_COLOR,
)
rectifier(detector, image, rectify_image_request["address"])
def read_image_for_rectify(address: str) -> Dict:
face_object = dict()
img = cv2.imread(address)
_, buffer = cv2.imencode(".jpg", img)
img = base64.b64encode(buffer).decode()
face_object["image"] = img
face_object["address"] = address
return face_object
folder_path = "./png2jpg"
file_paths = glob.glob(os.path.join(folder_path, "*.jpg"), recursive=True)
for id_file, file in enumerate(file_paths):
print(id_file, file)
rectify_image(read_image_for_rectify(file))
: 最初,我使用图像地址将图像直接传送到 Mediapipe,结果显示性能可接受。
但是,我现在需要以字典形式接收图像,其中图像以 base64
编码。我已相应地修改了数据输入,但在查看此场景中的输出后,Mediapipe 无法检测到许多图像中的地标。因此,我通过将这一行从更改为numpy数组将图像输入到mediapipe中
srgb_image = mp.Image.create_from_file(address)
srgb_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image)
第二种情况下的输出:
感谢
read_image_for_rectify
函数中
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
mediapipe 给出与第一种情况相同的结果。def read_image_for_rectify(address: str) -> Dict:
face_object = dict()
img = cv2.imread(address)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
_, buffer = cv2.imencode(".jpg", img)
img = base64.b64encode(buffer).decode()
face_object["image"] = img
face_object["address"] = address
return face_object
但是还无法检测到某些图像的地标,例如