ValueError:您必须指定pixel_values

问题描述 投票:0回答:1

是否可以使用 CLIP 模型生成图像而不给出参考图像?我尝试遵循文档并想出了这个:

import torch
from transformers import CLIPProcessor, CLIPModel
from PIL import Image

# Load CLIP model and processor
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

prompts = [
    "a cat",
]

for i, prompt in enumerate(prompts):
    with torch.no_grad():
        outputs = model(prompt)
        image_features = outputs.pixel_values

    # Convert image features to image
    image = Image.fromarray(image_features[0].numpy())

    image.save(f"generated_image_{i}.png")

但我收到此错误:

Traceback (most recent call last):
  File "clip.py", line 20, in <module>
    outputs = model(**inputs)
  File "/Users/x/.pyenv/versions/clip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/x/.pyenv/versions/clip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/x/.pyenv/versions/clip/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 1110, in forward
    vision_outputs = self.vision_model(
  File "/Users/x/.pyenv/versions/clip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/x/.pyenv/versions/clip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/x/.pyenv/versions/clip/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 847, in forward
    raise ValueError("You have to specify pixel_values")

文档:https://huggingface.co/docs/transformers/en/model_doc/clip

python huggingface-transformers torch clip generative-programming
1个回答
0
投票

不,这是不可能的。

根据文档,CLIP 是为图像文本相似性和零样本图像分类而设计的,这些任务与图像生成完全不同。对于任何给定的图像,模型使用提供的提示列表对其进行分类,该提示定义了类。图像必须始终用作输入。

您似乎正在寻找不同类型的模型架构。有几种可用的文本到图像模型架构,最引人注目的是扩散模型,例如稳定扩散

© www.soinside.com 2019 - 2024. All rights reserved.