是否可以使用 CLIP 模型生成图像而不给出参考图像?我尝试遵循文档并想出了这个:
import torch
from transformers import CLIPProcessor, CLIPModel
from PIL import Image
# Load CLIP model and processor
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
prompts = [
"a cat",
]
for i, prompt in enumerate(prompts):
with torch.no_grad():
outputs = model(prompt)
image_features = outputs.pixel_values
# Convert image features to image
image = Image.fromarray(image_features[0].numpy())
image.save(f"generated_image_{i}.png")
但我收到此错误:
Traceback (most recent call last):
File "clip.py", line 20, in <module>
outputs = model(**inputs)
File "/Users/x/.pyenv/versions/clip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/x/.pyenv/versions/clip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/x/.pyenv/versions/clip/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 1110, in forward
vision_outputs = self.vision_model(
File "/Users/x/.pyenv/versions/clip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/x/.pyenv/versions/clip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/x/.pyenv/versions/clip/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 847, in forward
raise ValueError("You have to specify pixel_values")
文档:https://huggingface.co/docs/transformers/en/model_doc/clip
不,这是不可能的。
根据文档,CLIP 是为图像文本相似性和零样本图像分类而设计的,这些任务与图像生成完全不同。对于任何给定的图像,模型使用提供的提示列表对其进行分类,该提示定义了类。图像必须始终用作输入。
您似乎正在寻找不同类型的模型架构。有几种可用的文本到图像模型架构,最引人注目的是扩散模型,例如稳定扩散。