我正在开发提供自定义 GPT Bot 服务的 RAG 应用程序,我正在存储 GPT 用于回答用户查询的文件 URL。
我分别存储每个 bot_id 的嵌入。以下是单独存储的每个机器人的嵌入,这些嵌入是根据使用中的 bot_id 检索的。
当用户更改文件 URL 时,我删除该机器人的现有 ChromaDB 文件夹,并在新文件 URL 上重新创建嵌入,并且在重新创建嵌入时显示以下错误:
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/api/client.py", line 438, in _validate_tenant_database
self._admin_client.get_tenant(name=tenant)
File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/api/client.py", line 486, in get_tenant
return self._server.get_tenant(name=name)
File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/api/segment.py", line 140, in get_tenant
return self._sysdb.get_tenant(name=name)
File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/db/mixins/sysdb.py", line 125, in get_tenant
with self.tx() as cur:
File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/db/impl/sqlite.py", line 131, in tx
return TxWrapper(self._conn_pool, stack=self._tx_stack)
File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/db/impl/sqlite.py", line 31, in __init__
self._conn = conn_pool.connect()
File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/db/impl/sqlite_pool.py", line 141, in connect
new_connection = Connection(
File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/db/impl/sqlite_pool.py", line 20, in __init__
self._conn = sqlite3.connect(
sqlite3.OperationalError: unable to open database file
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.10/site-packages/flask/app.py", line 1463, in wsgi_app
response = self.full_dispatch_request()
File "/home/ubuntu/.local/lib/python3.10/site-packages/flask/app.py", line 872, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/ubuntu/.local/lib/python3.10/site-packages/flask/app.py", line 870, in full_dispatch_request
rv = self.dispatch_request()
File "/home/ubuntu/.local/lib/python3.10/site-packages/flask/app.py", line 855, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
File "/home/ubuntu/chatbot/main.py", line 460, in qa
message = storeEmbeddings(embedding_model, raw_text, bot_id)
File "/home/ubuntu/chatbot/embeddings.py", line 12, in storeEmbeddings
db = Chroma.from_documents(
File "/home/ubuntu/.local/lib/python3.10/site-packages/langchain_community/vectorstores/chroma.py", line 778, in from_documents
return cls.from_texts(
File "/home/ubuntu/.local/lib/python3.10/site-packages/langchain_community/vectorstores/chroma.py", line 714, in from_texts
chroma_collection = cls(
File "/home/ubuntu/.local/lib/python3.10/site-packages/langchain_community/vectorstores/chroma.py", line 120, in __init__
self._client = chromadb.Client(_client_settings)
File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/__init__.py", line 274, in Client
return ClientCreator(tenant=tenant, database=database, settings=settings)
File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/api/client.py", line 144, in __init__
self._validate_tenant_database(tenant=tenant, database=database)
File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/api/client.py", line 447, in _validate_tenant_database
raise ValueError(
ValueError: Could not connect to tenant default_tenant. Are you sure it exists?
即使文件夹已成功删除,它似乎仍在尝试访问该机器人的旧 ChromaDB。我已使用以下方法按文件夹删除:
import shutil
shutil.rmtree("Embeddings/1001")
创建和存储嵌入的函数:
def storeEmbeddings(embedding, text, bot_id, embedding_folder):
try:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.create_documents([text])
db = Chroma.from_documents(
texts,
embedding,
persist_directory=embedding_folder+"//"+bot_id,
client_settings=Settings(anonymized_telemetry=False,is_persistent=True,),
)
return sucessMessage
except Exception as e:
return str(e)
最奇怪的是,当我此时停止并启动 python 应用程序时,它会重新创建该机器人的嵌入。
删除现有 ChromaDB 嵌入并创建新文档的最佳方法是什么?
我面临着同样的错误:
ValueError: Could not connect to tenant default_tenant. Are you sure it exists?
为了解决这个问题,我安装了旧版本的 Chroma,特别是 chromadb==0.4.9,它解决了我的问题。