运行 PyMilvus 应用程序时未找到 BGE-M3 索引

问题描述 投票:0回答:1

我正在调整这个现有的示例应用程序以供自己使用

https://zilliz.com/learn/Exploring-BGE-M3-the-future-of-information-retrieval-with-milvus

from FlagEmbedding import BGEM3FlagModel
import pymilvus
import numpy as np
from pymilvus import (
    connections,
    FieldSchema,
    CollectionSchema,
    DataType,
    Collection,
    utility,
)

connections.connect("default", host="localhost", port="19530")

fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1024),
]

schema = CollectionSchema(fields, description="Document Embeddings Collection")
collection = Collection(name="docs_embeddings", schema=schema)
collection.load()

if utility.has_collection(collection_name):
    collection.release()

if collection.has_index():
    collection.drop_index()

index_params = {"index_type": "IVF_FLAT", "metric_type": "L2", "params": {"nlist": 128}}
collection.create_index("embedding", index_params)
print("New index created successfully.")
collection.load()


from pymilvus.model.hybrid import BGEM3EmbeddingFunction
bge_m3 = BGEM3EmbeddingFunction(model_name="BAAI/bge-m3", device="cpu", use_fp16=False)

documents = [
    "Climate change is a significant global issue.",
    "El cambio climático es un problema global significativo.",
    "气候变化是一个重大的全球问题。",
]


docs_embeddings = bge_m3.encode_documents(documents)
entities = [{"embedding": doc.tolist()} for doc in docs_embeddings["dense"]]
insert_result = collection.insert(entities)
collection.load()

search_params = {"metric_type": "L2", "params": {"nprobe": 10}}
query_embeddings = docs_embeddings["dense"][2:3]
result = collection.search(
    query_embeddings, "embedding", search_params, limit=6, output_fields=["id"]
)

for hits in result:
    for hit in hits:
        print(f"hit: {hit}, id: {hit.id}")

当我运行此 python 代码时,出现以下错误。

RPC error: [load_collection], <MilvusException: (code=700, message=index not found[collection=docs_embeddings])>, <Time:{'RPC start': '2024-09-11 16:15:35.363838', 'RPC error': '2024-09-11 16:15:35.368702'}>
Traceback (most recent call last):
  File "/Users/tom/milvus-lite/testing/app1.py", line 32, in <module>
    collection.load()
  File "/Users/tom/milvus-lite/testing/qa_venv/lib/python3.10/site-packages/pymilvus/orm/collection.py", line 425, in load
    conn.load_collection(

我无法克服这一点,一切看起来都很好,没有 Python 拼写错误。

我尝试了上面的代码,看起来和示例一样。 它应该给我收集数据和 ID 的输出。

milvus
1个回答
1
投票

您需要修改 milvus.yaml 以适应 gcp 云存储。

minio:
  # IP address of MinIO or S3 service.
  # Environment variable: MINIO_ADDRESS
  # minio.address and minio.port together generate the valid access to MinIO or S3 service.
  # MinIO preferentially acquires the valid IP address from the environment variable MINIO_ADDRESS when Milvus is started.
  # Default value applies when MinIO or S3 is running on the same network with Milvus.
  address: localhost
  port: 9000 # Port of MinIO or S3 service.
  # Access key ID that MinIO or S3 issues to user for authorized access.
  # Environment variable: MINIO_ACCESS_KEY_ID or minio.accessKeyID
  # minio.accessKeyID and minio.secretAccessKey together are used for identity authentication to access the MinIO or S3 service.
  # This configuration must be set identical to the environment variable MINIO_ACCESS_KEY_ID, which is necessary for starting MinIO or S3.
  # The default value applies to MinIO or S3 service that started with the default docker-compose.yml file.
  accessKeyID: minioadmin
  # Secret key used to encrypt the signature string and verify the signature string on server. It must be kept strictly confidential and accessible only to the MinIO or S3 server and users.
  # Environment variable: MINIO_SECRET_ACCESS_KEY or minio.secretAccessKey
  # minio.accessKeyID and minio.secretAccessKey together are used for identity authentication to access the MinIO or S3 service.
  # This configuration must be set identical to the environment variable MINIO_SECRET_ACCESS_KEY, which is necessary for starting MinIO or S3.
  # The default value applies to MinIO or S3 service that started with the default docker-compose.yml file.
  secretAccessKey: minioadmin
  useSSL: false # Switch value to control if to access the MinIO or S3 service through SSL.
  ssl:
    tlsCACert: /path/to/public.crt # path to your CACert file
  # Name of the bucket where Milvus stores data in MinIO or S3.
  # Milvus 2.0.0 does not support storing data in multiple buckets.
  # Bucket with this name will be created if it does not exist. If the bucket already exists and is accessible, it will be used directly. Otherwise, there will be an error.
  # To share an MinIO instance among multiple Milvus instances, consider changing this to a different value for each Milvus instance before you start them. For details, see Operation FAQs.
  # The data will be stored in the local Docker if Docker is used to start the MinIO service locally. Ensure that there is sufficient storage space.
  # A bucket name is globally unique in one MinIO or S3 instance.
  bucketName: a-bucket
  # Root prefix of the key to where Milvus stores data in MinIO or S3.
  # It is recommended to change this parameter before starting Milvus for the first time.
  # To share an MinIO instance among multiple Milvus instances, consider changing this to a different value for each Milvus instance before you start them. For details, see Operation FAQs.
  # Set an easy-to-identify root key prefix for Milvus if etcd service already exists.
  # Changing this for an already running Milvus instance may result in failures to read legacy data.
  rootPath: files
  # Whether to useIAM role to access S3/GCS instead of access/secret keys
  # For more information, refer to
  # aws: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html
  # gcp: https://cloud.google.com/storage/docs/access-control/iam
  # aliyun (ack): https://www.alibabacloud.com/help/en/container-service-for-kubernetes/latest/use-rrsa-to-enforce-access-control
  # aliyun (ecs): https://www.alibabacloud.com/help/en/elastic-compute-service/latest/attach-an-instance-ram-role
  useIAM: false
  # Cloud Provider of S3. Supports: "aws", "gcp", "aliyun".
  # You can use "aws" for other cloud provider supports S3 API with signature v4, e.g.: minio
  # You can use "gcp" for other cloud provider supports S3 API with signature v2
  # You can use "aliyun" for other cloud provider uses virtual host style bucket
  # When useIAM enabled, only "aws", "gcp", "aliyun" is supported for now
  cloudProvider: aws
  # Custom endpoint for fetch IAM role credentials. when useIAM is true & cloudProvider is "aws".
  # Leave it empty if you want to use AWS default endpoint
  iamEndpoint: 
  logLevel: fatal # Log level for aws sdk log. Supported level:  off, fatal, error, warn, info, debug, trace
  region:  # Specify minio storage system location region
  useVirtualHost: false # Whether use virtual host mode for bucket
  requestTimeoutMs: 10000 # minio timeout for request time in milliseconds
  # The maximum number of objects requested per batch in minio ListObjects rpc, 
  # 0 means using oss client by default, decrease these configration if ListObjects timeout
  listObjectsMaxKeys: 0

看来您已经编译了 milvus 源代码,并且想在本地运行 milvus。 我想你已经成功编译它并生成了 ./bin/milvus 。 按照以下步骤启动本地 milvus:

你的命令行在milvus项目根路径下,进入deployments/docker/dev,该路径下有一个docker-compose.yaml。启动依赖项,包括Etcd、Pulsar。你可以从 docker-compose.yaml 中注释掉 Minio,因为你想使用 GCP 作为存储。

cd deployments/docker/dev
docker-compose up -d

回到 milvus 项目根路径。编辑 configs/milvus.yaml,设置 gcp 的配置:

minio:
  address: xxxxxxxxxx   # address of GCP service.
  port: 9000 # Port of GCP service.
  accessKeyID: xxxxx
  secretAccessKey: xxxxxx
  bucketName: xxxxx
  rootPath: xxxxx
  cloudProvider: gcp
  region: xxxxx
use this script to start a milvus standalone locally
./scripts/start_standalone.sh

日志输出到/tmp/standalone.log,如果启动失败,请检查该日志。如果成功启动,您将在日志中看到“代理成功启动”。

© www.soinside.com 2019 - 2024. All rights reserved.