Langchain FastEmbed 与 ChromaDB

问题描述 投票:0回答:1

我正在尝试遵循我发现的一个简单示例,将 Langchain 与 FastEmbed 和 ChromaDB 结合使用。我最终也会将其连接到离线模型。我相信我已经正确设置了 python 环境并且具有正确的依赖项。我使用嵌入然后调用 Chroma.from_documents 的方式是否有问题?

from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores.utils import filter_complex_metadata
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
import glob

class ChatPDF:
    vector_store = None
    retriever = None
    chain = None

    def __init__(self):
        self.embeddings = FastEmbedEmbeddings()
        self.text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=100)

    def ingest(self, file_path: str):

        doc = PyPDFLoader(file_path=file_path).load()
        chunks = self.text_splitter.split_documents(doc)
        chunks = filter_complex_metadata(chunks)

        # generate vector store
        vector_store = Chroma.from_documents(documents=chunks, embedding=self.embeddings)




new_chat = ChatPDF()
docs_to_process = glob.glob("tmp/*.pdf")

for pdf in docs_to_process:
    new_chat.ingest(file_path=pdf)

但是,当我运行它时,我得到以下信息:

Traceback (most recent call last):
  File "/Users/c/Documents/Code/embeddings-test.py", line 35, in <module>
    new_chat.ingest(file_path=pdf)
  File "/Users/c/Documents/Code/embeddings-test.py", line 26, in ingest
    vector_store = Chroma.from_documents(documents=chunks, embedding=self.embeddings)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/c/Documents/Code/.venv/lib/python3.12/site-packages/langchain_community/vectorstores/chroma.py", line 878, in from_documents
    return cls.from_texts(
           ^^^^^^^^^^^^^^^
  File "/Users/c/Documents/Code/.venv/lib/python3.12/site-packages/langchain_community/vectorstores/chroma.py", line 842, in from_texts
    chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)
  File "/Users/c/Documents/Code/.venv/lib/python3.12/site-packages/langchain_community/vectorstores/chroma.py", line 277, in add_texts
    embeddings = self._embedding_function.embed_documents(texts)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/c/Documents/Code/.venv/lib/python3.12/site-packages/langchain_community/embeddings/fastembed.py", line 117, in embed_documents
    embeddings = self._model.embed(
                 ^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'embed'
python py-langchain chromadb
1个回答
0
投票

实际上我遇到了同样的问题,它正在工作,然后显示此错误

© www.soinside.com 2019 - 2024. All rights reserved.