在运行 StableDiffusionInpaintPipeline 执行修复任务时,我遇到了一个持续存在的问题。尽管以预期的格式传递输入(图像和蒙版均采用 PIL.Image.Image 格式且大小正确),但我仍然收到以下错误:
ValueError: Input is in incorrect format. Currently, we only support <class 'PIL.Image.Image'>, <class 'numpy.ndarray'>, <class 'torch.Tensor'>
这是代码:
@torch.inference_mode()
def generate_image_bytes(base64_img, request, prompt) -> dict:
# Get a model from the pool
model = model_pool.get_model()
try:
if base64_img is not None:
img_data = base64.b64decode(base64_img)
img = Image.open(io.BytesIO(img_data))
print("Recieved image of size: ", img.size)
img = img.convert("RGB")
image_np = np.array(img)
roi_points = [
request["box"]["x1"],
request["box"]["y1"],
request["box"]["x2"],
request["box"]["y2"],
]
results = model["seg_model"](
image_np,
device="cuda",
conf=0.6,
iou=0.9,
bboxes = roi_points,
labels=[1],
verbose=False
)
black_mask = np.zeros_like(image_np[:, :, 0], dtype=np.uint8)
for m in results[0].masks.data:
mask = m.cpu().numpy().astype(np.uint8)
orig_shape = results[0].masks.orig_shape
resized_mask = cv2.resize(mask, (orig_shape[1], orig_shape[0]))
black_mask[resized_mask == 1] = 255
image_pil = Image.fromarray(image_np)
mask_pil = Image.fromarray(black_mask).convert("L")
print(f"Image size: {image_pil.size}, mode: {image_pil.mode}")
print(f"Mask size: {mask_pil.size}, mode: {mask_pil.mode}")
generator = torch.Generator(device="cuda").manual_seed(0)
image = model["pipeline"](
prompt=prompt,
negative_prompt=IMG_INPAINTING_NEG_PROMPT,
image=image_pil,
mask=mask_pil,
guidance_scale=8.0,
num_inference_steps=50,
strength=0.05,
generator=generator,
).images[0]
# Convert image to bytes
bytes = io.BytesIO()
image.save(bytes, format="PNG")
return {"bytes": base64.b64encode(bytes.getvalue()).decode("utf-8")}
我尝试查看这两个图像,即 image_pil 和 mask_pil,它们没有问题。它们的格式和大小也相同,但我仍然收到此错误。
好吧,它已经修好了,我必须通过
mask_image=mask_pil
,但我正在通过mask=mask_pil