我需要将视频的一帧(这是一个 nparray)转换为 pytorch 张量,用它做一些特定的操作并将其转换回来,但我很挣扎
所以,我有一个从 video_capture.read() 返回的帧,据我所知,它是一个 np 数组。首先,我将其转换为张量并检查看起来是否正确(抱歉,由于某种原因我无法添加照片) 然后我分析它(没有错误),尝试旋转它,这是一个问题。
有人可以帮我解决这个问题吗,我太累了,chatgpt 让我更加困惑,什么都不明白......我想颜色问题与我如何将张量转换为 pil 图像有关,但我尝试过进行了一些更改(注释行),但没有任何帮助。 还有一种方法可以避免在旋转之前将张量转换为 pil 图像?我不能旋转张量吗?
def tensor_to_image(tensor):
tensor = (tensor * 255).byte()
tensor = tensor.squeeze(0)
tensor = tensor.permute(1, 2, 0)
image = Image.fromarray(np.array(tensor).astype(np.uint8))
image = cv2.cvtColor(np.asarray(image), cv2.COLOR_BGR2RGB)
image = Image.fromarray(np.asarray(image))
return image
def rotate_tensor(frame_tensor, landmarks):
roll = calc_face_angle(landmarks)
frame = tf.to_pil_image(frame_tensor.squeeze(0))
#frame = tensor_to_image(frame_tensor)
frame.show()
if not np.isnan(roll):
rotated_frame = frame.rotate(roll, resample=Image.BICUBIC, expand=True)
else:
print("Failed to calculate face angle for rotation")
return frame_tensor
#rotated_tensor = tf.to_tensor(rotated_frame).unsqueeze(0)
transform = transforms.ToTensor() # Используем torchvision для преобразования в тензор
rotated_tensor = transform(rotated_frame).unsqueeze(0)
return rotated_tensor
def check_tensor(self, frame_tensor):
frame_numpy = frame_tensor.squeeze(0).permute(1, 2, 0).byte().numpy()
#frame_numpy = cv2.cvtColor(frame_numpy, cv2.COLOR_RGB2BGR)
cv2.imshow("Frame", frame_numpy)
cv2.waitKey(0)
cv2.destroyAllWindows()
def analyze_video(self, video_path):
video_capture = cv2.VideoCapture(video_path)
for i in range(1):
ret, frame = video_capture.read()
if not ret:
break
# преобразуем фрейм в тензор
frame_tensor = torch.from_numpy(frame).float()
frame_tensor = frame_tensor.permute(2, 0, 1).unsqueeze(0)
#frame_tensor = frame_tensor[:, [2, 1, 0], :, :]
self.check_tensor(frame_tensor)
orig_prediction = self.analyze_frame(frame_tensor)
rotated_tensor = im.rotate_tensor(frame_tensor, orig_prediction.head())
self.check_tensor(rotated_tensor)
frame.show()
显示错误颜色的原因是OpenCV使用BGR格式,而PyTorch和PIL使用RGB格式。您的框架是从 OpenCV(BGR 格式)创建的,然后您尝试使用 PIL(RGB 格式)显示它而不进行任何转换。self.check_tensor(rotated_tensor)
显示黑屏的原因是因为 rotate_tensor()
中的图像到张量转换将值标准化为范围 [0, 1],并且当您将其转换为 check_tensor()
中的整数时,它会将所有值为 0,因为它们都是 0 到 1 之间的小数。tensor_to_image()
中再次将其乘以 255,并且如果您这样做,OpenCV 和 PIL 通常会修改这些值,从而导致意外的颜色。def tensor_to_image(tensor):
tensor = tensor.byte()
tensor = tensor.squeeze(0)
tensor = tensor.permute(1, 2, 0)
image = Image.fromarray(np.array(tensor).astype(np.uint8))
image = cv2.cvtColor(np.asarray(image), cv2.COLOR_BGR2RGB)
image = Image.fromarray(np.asarray(image))
return image
def rotate_tensor(frame_tensor, landmarks):
roll = calc_face_angle(landmarks)
frame = tensor_to_image(frame_tensor)
frame.show()
if not np.isnan(roll):
rotated_frame = frame.rotate(roll, resample=Image.BICUBIC, expand=True)
else:
print("Failed to calculate face angle for rotation")
return frame_tensor
rotated_frame = cv2.cvtColor(np.asarray(rotated_frame), cv2.COLOR_RGB2BGR)
transform = transforms.ToTensor()
rotated_tensor = transform(rotated_frame).unsqueeze(0)
return rotated_tensor * 255