我想将 llama-2 的隐藏状态作为嵌入模型传递给我的方法
FAISS.from_document(<filepath>, <embedding_model>)
。
目前,我有 llama-2 模型并获得字符串的嵌入。
model_config = transformers.AutoConfig.from_pretrained(
model_id,
output_hidden_states=True,
use_auth_token=auth_token,
)
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
# Input data to test the code
input_text = "Hello World!"
encoded_input = tokenizer(input_text, return_tensors='pt')
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf",
trust_remote_code=True,
config=model_config,
quantization_config=bnb_config,
device_map='auto',
use_auth_token=auth_token
)
outputs = model(**encoded_input)
hidden_states = outputs.hidden_states
print(len(hidden_states)) # 33 for Llama-2: 1 (embeddings) + 32 (layers)
print(hidden_states[0].shape) # Shape of the embeddings
print(hidden_states[2])
打印输出:
33
torch.Size([1, 4, 4096])
tensor([[[ 0.0373, -0.5762, -0.0180, ..., 0.0962, -0.1099, 0.3767],
[ 0.0676, 0.0400, -0.0033, ..., 0.0655, 0.0278, -0.0079],
[-0.0160, 0.0157, 0.0478, ..., -0.0224, -0.0341, 0.0093],
[ 0.0229, -0.0104, 0.0217, ..., -0.0080, -0.0012, -0.0342]]],
dtype=torch.float16, grad_fn=<ToCopyBackward0>)
现在,我想用 Llama-2 构建文档的嵌入:
from langchain.vectorstores import FAISS
# <clean> is the file-path
FAISS.from_documents(clean, model)
AttributeError: 'LlamaForCausalLM' object has no attribute 'embed_documents'
如何解决这个问题以及如何使用 Llama-2-Hidden-States 进行嵌入?
我也有类似的问题。 您可以在这里找到 langchain 的假嵌入结构: https://api.python.langchain.com/en/latest/_modules/langchain/embeddings/fake.html#FakeEmbeddings.embed_documents