我正在按照步骤从导入数据将数据导入到 milvus db。
下面是我创建架构和准备数据的脚本。
milvus_db/import_test.py
import random
import string
from pymilvus import MilvusClient, DataType
from pymilvus import RemoteBulkWriter, BulkFileType
# Third-party constants
ACCESS_KEY="minioadmin"
SECRET_KEY="minioadmin"
BUCKET_NAME="milvus-bucket"
def generate_random_str(length=5):
letters = string.ascii_uppercase
digits = string.digits
return ''.join(random.choices(letters + digits, k=length))
def prepare_import_data():
# You need to work out a collection schema out of your dataset.
schema = MilvusClient.create_schema(
auto_id=False,
enable_dynamic_field=True
)
schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="vector", datatype=DataType.FLOAT_VECTOR, dim=768)
schema.add_field(field_name="scalar_1", datatype=DataType.VARCHAR, max_length=512)
schema.add_field(field_name="scalar_2", datatype=DataType.INT64)
schema.verify()
client = MilvusClient("http://localhost:19530")
client.create_collection(
collection_name="quick_setup",
schema=schema
)
# Connections parameters to access the remote bucket
conn = RemoteBulkWriter.S3ConnectParam(
endpoint="localhost:9000", # the default MinIO service started along with Milvus
access_key=ACCESS_KEY,
secret_key=SECRET_KEY,
bucket_name=BUCKET_NAME,
secure=False
)
writer = RemoteBulkWriter(
schema=schema,
remote_path="/",
connect_param=conn,
file_type=BulkFileType.PARQUET
)
for i in range(100):
writer.append_row({
"id": i,
"vector": [random.uniform(-1, 1) for _ in range(768)],
"scalar_1": generate_random_str(random.randint(1, 20)),
"scalar_2": random.randint(0, 100),
})
writer.commit()
print(writer.batch_files)
if __name__ == "__main__":
prepare_import_data()
此代码将创建 100 条记录并上传到 minio 存储桶。
user % python milvus_db/prepare_test.py
[['ca494906-371a-43d2-9b1a-555f80db90dd/1.parquet']]
然后我运行 API 来导入数据。
export MILVUS_URI="localhost:19530"
curl --request POST "http://${MILVUS_URI}/v2/vectordb/jobs/import/create" \
--header "Content-Type: application/json" \
--data-raw '{
"files": [
[
"/ca494906-371a-43d2-9b1a-555f80db90dd/1.parquet"
]
],
"collectionName": "quick_setup"
}'
它回来了
{"code":0,"data":{"jobId":"453789492604835435"}}
我使用下面的 API 检查状态,它返回错误。
curl --request POST "http://${MILVUS_URI}/v2/vectordb/jobs/import/get_progress" \
--header "Content-Type: application/json" \
--data-raw '{
"jobId": "453789492604835435"
}'
这次回归
{
"code": 0,
"data": {
"collectionName": "quick_setup",
"completeTime": "",
"details": [],
"fileSize": 0,
"importedRows": 0,
"jobId": "453789492604835435",
"progress": 0,
"reason": "new parquet reader failed, err=parquet: could not retrieve footer offset: The specified key does not exist.: importing data failed",
"state": "Failed",
"totalRows": 0
}
}
根据文档,返回
code
应该是 200,我得到 code=0
。而且它失败了,有很多错误。
Docker 日志
milvus-standalone | [2024/11/08 19:11:30.674 +00:00] [INFO] [writebuffer/write_buffer.go:252] ["checkpoint from latest consumed msg"] [collectionID=451658620880510403] [channel=by-dev-rootcoord-dml_3_451658620880510403v0]
milvus-standalone | [2024/11/08 19:11:30.856 +00:00] [INFO] [observers/collection_observer.go:322] ["partition load progress"] [collectionID=451834069462288051] [partitionID=451834069462288052] [subChannelCount=1] [loadSegmentCount=2]
milvus-standalone | [2024/11/08 19:11:30.862 +00:00] [INFO] [observers/collection_observer.go:345] ["load status updated"] [collectionID=451834069462288051] [partitionID=451834069462288052] [partitionLoadPercentage=100] [collectionLoadPercentage=100]
milvus-standalone | [2024/11/08 19:11:30.863 +00:00] [INFO] [observers/collection_observer.go:267] ["Load task finish"] [traceID=LoadCollection_451834069462288051] [collectionID=451834069462288051] [partitionIDs="[]"] [loadType=LoadCollection]
milvus-standalone | [2024/11/08 19:11:30.872 +00:00] [INFO] [writebuffer/write_buffer.go:252] ["checkpoint from latest consumed msg"] [collectionID=451834069462288051] [channel=by-dev-rootcoord-dml_0_451834069462288051v0]
milvus-standalone | [2024/11/08 19:11:31.075 +00:00] [INFO] [writebuffer/write_buffer.go:252] ["checkpoint from latest consumed msg"] [collectionID=453789492604825756] [channel=by-dev-rootcoord-dml_2_453789492604825756v0]
milvus-standalone | [2024/11/08 19:11:31.075 +00:00] [INFO] [writebuffer/write_buffer.go:252] ["checkpoint from latest consumed msg"] [collectionID=452543465012995768] [channel=by-dev-rootcoord-dml_7_452543465012995768v0]
milvus-standalone | [2024/11/08 19:11:31.275 +00:00] [INFO] [writebuffer/write_buffer.go:252] ["checkpoint from latest consumed msg"] [collectionID=452344796020212245] [channel=by-dev-rootcoord-dml_1_452344796020212245v0]
milvus-standalone | [2024/11/08 19:11:31.852 +00:00] [INFO] [datacoord/index_service.go:682] ["receive DescribeIndex request"] [traceID=82585a4701b02cefe62eef49c4c0a38b] [collectionID=452344796020212245] [indexName=] [timestamp=0]
milvus-standalone | [2024/11/08 19:11:31.852 +00:00] [INFO] [datacoord/index_service.go:603] ["completeIndexInfo success"] [collectionID=452344796020212245] [indexID=453789492604822096] [totalRows=20] [indexRows=20] [pendingIndexRows=0] [state=Finished] [failReason=]
milvus-standalone | [2024/11/08 19:11:31.852 +00:00] [INFO] [datacoord/index_service.go:730] ["DescribeIndex success"] [traceID=82585a4701b02cefe62eef49c4c0a38b] [collectionID=452344796020212245] [indexName=]
milvus-standalone | [2024/11/08 19:11:31.852 +00:00] [INFO] [datacoord/index_service.go:682] ["receive DescribeIndex request"] [traceID=82585a4701b02cefe62eef49c4c0a38b] [collectionID=452543465012995768] [indexName=] [timestamp=0]
milvus-standalone | [2024/11/08 19:11:31.852 +00:00] [WARN] [datacoord/index_service.go:696] ["DescribeIndex fail"] [traceID=82585a4701b02cefe62eef49c4c0a38b] [collectionID=452543465012995768] [indexName=] [error="index not found[indexName=]"]
milvus-standalone | [2024/11/08 19:11:31.852 +00:00] [WARN] [datacoord/metrics_info.go:69] ["failed to describe index, ignore to report index metrics"] [traceID=82585a4701b02cefe62eef49c4c0a38b] [collection=452543465012995768] [error="index not found[indexName=]"]
milvus-standalone | [2024/11/08 19:11:31.852 +00:00] [INFO] [datacoord/index_service.go:682] ["receive DescribeIndex request"] [traceID=82585a4701b02cefe62eef49c4c0a38b] [collectionID=451658620880510403] [indexName=] [timestamp=0]
milvus-standalone | [2024/11/08 19:11:31.852 +00:00] [INFO] [datacoord/index_service.go:603] ["completeIndexInfo success"] [collectionID=451658620880510403] [indexID=451834069463728439] [totalRows=20] [indexRows=20] [pendingIndexRows=0] [state=Finished] [failReason=]
milvus-standalone | [2024/11/08 19:11:31.852 +00:00] [INFO] [datacoord/index_service.go:603] ["completeIndexInfo success"] [collectionID=451658620880510403] [indexID=451834069463728988] [totalRows=20] [indexRows=20] [pendingIndexRows=0] [state=Finished] [failReason=]
milvus-standalone | [2024/11/08 19:11:31.852 +00:00] [INFO] [datacoord/index_service.go:730] ["DescribeIndex success"] [traceID=82585a4701b02cefe62eef49c4c0a38b] [collectionID=451658620880510403] [indexName=]
milvus-standalone | [2024/11/08 19:11:31.852 +00:00] [INFO] [datacoord/index_service.go:682] ["receive DescribeIndex request"] [traceID=82585a4701b02cefe62eef49c4c0a38b] [collectionID=451834069462288051] [indexName=] [timestamp=0]
milvus-standalone | [2024/11/08 19:11:31.852 +00:00] [INFO] [datacoord/index_service.go:603] ["completeIndexInfo success"] [collectionID=451834069462288051] [indexID=451834069462288378] [totalRows=19016] [indexRows=19016] [pendingIndexRows=0] [state=Finished] [failReason=]
milvus-standalone | [2024/11/08 19:11:31.852 +00:00] [INFO] [datacoord/index_service.go:730] ["DescribeIndex success"] [traceID=82585a4701b02cefe62eef49c4c0a38b] [collectionID=451834069462288051] [indexName=]
milvus-standalone | [2024/11/08 19:11:31.859 +00:00] [INFO] [datacoord/import_checker.go:196] ["add new preimport task"] [taskID=453795658005217491] [jobID=453795658005217479] [collectionID=453789492604825756] [type=PreImportTask]
milvus-standalone | [2024/11/08 19:11:31.860 +00:00] [INFO] [datacoord/import_scheduler.go:175] ["processing pending preimport task..."] [taskID=453795658005217491] [jobID=453795658005217479] [collectionID=453789492604825756] [type=PreImportTask]
milvus-standalone | [2024/11/08 19:11:31.861 +00:00] [INFO] [datanode/services.go:415] ["datanode receive preimport request"] [traceID=049177b63fc9a9e0ca45bae6feea028b] [taskID=453795658005217491] [jobID=453795658005217479] [collectionID=453789492604825756] [partitionIDs="[453789492604825757]"] [vchannels="[by-dev-rootcoord-dml_2_453789492604825756v0]"] [files="[{\"id\":453795658005217480,\"paths\":[\"/ca494906-371a-43d2-9b1a-555f80db90dd/1.parquet\"]}]"]
milvus-standalone | [2024/11/08 19:11:31.861 +00:00] [INFO] [datanode/services.go:424] ["datanode added preimport task"] [traceID=049177b63fc9a9e0ca45bae6feea028b] [taskID=453795658005217491] [jobID=453795658005217479] [collectionID=453789492604825756] [partitionIDs="[453789492604825757]"] [vchannels="[by-dev-rootcoord-dml_2_453789492604825756v0]"] [files="[{\"id\":453795658005217480,\"paths\":[\"/ca494906-371a-43d2-9b1a-555f80db90dd/1.parquet\"]}]"]
milvus-standalone | [2024/11/08 19:11:31.862 +00:00] [INFO] [datacoord/import_scheduler.go:190] ["process pending preimport task done"] [taskID=453795658005217491] [jobID=453795658005217479] [collectionID=453789492604825756] [type=PreImportTask]
milvus-standalone | [2024/11/08 19:11:32.772 +00:00] [INFO] [importv2/scheduler.go:154] ["start to preimport"] [taskID=453795658005217491] [jobID=453795658005217479] [collectionID=453789492604825756] [type=PreImportTask] [bufferSize=16777216] [schema="name:\"quick_setup\" fields:<fieldID:100 name:\"id\" is_primary_key:true data_type:Int64 > fields:<fieldID:101 name:\"vector\" data_type:FloatVector type_params:<key:\"dim\" value:\"768\" > > fields:<fieldID:102 name:\"scalar_1\" data_type:VarChar type_params:<key:\"max_length\" value:\"512\" > > fields:<fieldID:103 name:\"scalar_2\" data_type:Int64 > fields:<fieldID:104 name:\"$meta\" description:\"dynamic schema\" data_type:JSON is_dynamic:true > enable_dynamic_field:true "]
milvus-standalone | [2024/11/08 19:11:32.779 +00:00] [WARN] [importv2/scheduler.go:148] ["new reader failed"] [taskID=453795658005217491] [jobID=453795658005217479] [collectionID=453789492604825756] [type=PreImportTask] [error="new parquet reader failed, err=parquet: could not retrieve footer offset: The specified key does not exist.: importing data failed"] [errorVerbose="new parquet reader failed, err=parquet: could not retrieve footer offset: The specified key does not exist.: importing data failed\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/pkg/util/merr.WrapErrImportFailed\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/merr/utils.go:1030\n | github.com/milvus-io/milvus/internal/util/importutilv2/parquet.NewReader\n | \t/go/src/github.com/milvus-io/milvus/internal/util/importutilv2/parquet/reader.go:63\n | github.com/milvus-io/milvus/internal/util/importutilv2.NewReader\n | \t/go/src/github.com/milvus-io/milvus/internal/util/importutilv2/reader.go:72\n | github.com/milvus-io/milvus/internal/datanode/importv2.(*scheduler).PreImport.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/datanode/importv2/scheduler.go:164\n | github.com/milvus-io/milvus/internal/datanode/importv2.(*scheduler).PreImport.func3\n | \t/go/src/github.com/milvus-io/milvus/internal/datanode/importv2/scheduler.go:186\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/[email protected]/worker.go:67\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_arm64.s:1172\nWraps: (2) new parquet reader failed, err=parquet: could not retrieve footer offset: The specified key does not exist.\nWraps: (3) importing data failed\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) merr.milvusError"]
milvus-standalone | [2024/11/08 19:11:33.856 +00:00] [WARN] [datacoord/import_scheduler.go:239] ["preimport failed"] [taskID=453795658005217491] [jobID=453795658005217479] [collectionID=453789492604825756] [type=PreImportTask] [reason="new parquet reader failed, err=parquet: could not retrieve footer offset: The specified key does not exist.: importing data failed"]
milvus-standalone | [2024/11/08 19:11:34.848 +00:00] [INFO] [datacoord/index_service.go:682] ["receive DescribeIndex request"] [traceID=c07191a7acf51db14620601bfb6f9f03] [collectionID=451658620880510403] [indexName=] [timestamp=0]
milvus-standalone | [2024/11/08 19:11:34.848 +00:00] [INFO] [datacoord/index_service.go:603] ["completeIndexInfo success"] [collectionID=451658620880510403] [indexID=451834069463728439] [totalRows=20] [indexRows=20] [pendingIndexRows=0] [state=Finished] [failReason=]
milvus-standalone | [2024/11/08 19:11:34.848 +00:00] [INFO] [datacoord/index_service.go:603] ["completeIndexInfo success"] [collectionID=451658620880510403] [indexID=451834069463728988] [totalRows=20] [indexRows=20] [pendingIndexRows=0] [state=Finished] [failReason=]
milvus-standalone | [2024/11/08 19:11:34.848 +00:00] [INFO] [datacoord/index_service.go:730] ["DescribeIndex success"] [traceID=c07191a7acf51db14620601bfb6f9f03] [collectionID=451658620880510403] [indexName=]
milvus-standalone | [2024/11/08 19:11:34.848 +00:00] [INFO] [datacoord/index_service.go:682] ["receive DescribeIndex request"] [traceID=c07191a7acf51db14620601bfb6f9f03] [collectionID=451834069462288051] [indexName=] [timestamp=0]
milvus-standalone | [2024/11/08 19:11:34.848 +00:00] [INFO] [datacoord/index_service.go:603] ["completeIndexInfo success"] [collectionID=451834069462288051] [indexID=451834069462288378] [totalRows=19016] [indexRows=19016] [pendingIndexRows=0] [state=Finished] [failReason=]
milvus-standalone | [2024/11/08 19:11:34.848 +00:00] [INFO] [datacoord/index_service.go:730] ["DescribeIndex success"] [traceID=c07191a7acf51db14620601bfb6f9f03] [collectionID=451834069462288051] [indexName=]
milvus-standalone | [2024/11/08 19:11:34.848 +00:00] [INFO] [datacoord/index_service.go:682] ["receive DescribeIndex request"] [traceID=c07191a7acf51db14620601bfb6f9f03] [collectionID=452344796020212245] [indexName=] [timestamp=0]
milvus-standalone | [2024/11/08 19:11:34.848 +00:00] [INFO] [datacoord/index_service.go:603] ["completeIndexInfo success"] [collectionID=452344796020212245] [indexID=453789492604822096] [totalRows=20] [indexRows=20] [pendingIndexRows=0] [state=Finished] [failReason=]
milvus-standalone | [2024/11/08 19:11:34.848 +00:00] [INFO] [datacoord/index_service.go:730] ["DescribeIndex success"] [traceID=c07191a7acf51db14620601bfb6f9f03] [collectionID=452344796020212245] [indexName=]
milvus-standalone | [2024/11/08 19:11:34.849 +00:00] [INFO] [datacoord/index_service.go:682] ["receive DescribeIndex request"] [traceID=c07191a7acf51db14620601bfb6f9f03] [collectionID=452543465012995768] [indexName=] [timestamp=0]
milvus-standalone | [2024/11/08 19:11:34.849 +00:00] [WARN] [datacoord/index_service.go:696] ["DescribeIndex fail"] [traceID=c07191a7acf51db14620601bfb6f9f03] [collectionID=452543465012995768] [indexName=] [error="index not found[indexName=]"]
milvus-standalone | [2024/11/08 19:11:34.849 +00:00] [WARN] [datacoord/metrics_info.go:69] ["failed to describe index, ignore to report index metrics"] [traceID=c07191a7acf51db14620601bfb6f9f03] [collection=452543465012995768] [error="index not found[indexName=]"]
milvus-standalone | [2024/11/08 19:11:35.854 +00:00] [WARN] [datacoord/import_checker.go:306] ["Import job has failed, all tasks with the same jobID will be marked as failed"] [jobID=453795658005217479]
milvus-standalone | [2024/11/08 19:11:35.857 +00:00] [INFO] [datanode/services.go:514] ["datanode drop import done"] [traceID=5152e399f041486a872ad7822797e260] [taskID=453795658005217491] [jobID=453795658005217479]
milvus-standalone | [2024/11/08 19:11:35.857 +00:00] [INFO] [datacoord/import_util.go:427] ["drop import in datanode done"] [taskID=453795658005217491] [jobID=453795658005217479] [collectionID=453789492604825756] [type=PreImportTask]
milvus-standalone | [2024/11/08 19:11:36.777 +00:00] [INFO] [datacoord/meta.go:1446] ["UpdateChannelCheckpoint done"] [channel=by-dev-rootcoord-dml_0_451834069462288051v0] [ts=453795665949229058] [time=2024/11/08 19:11:26.049 +00:00]
milvus-standalone | [2024/11/08 19:11:36.777 +00:00] [INFO] [datacoord/meta.go:1446] ["UpdateChannelCheckpoint done"] [channel=by-dev-rootcoord-dml_2_453789492604825756v0] [ts=453795665949229058] [time=2024/11/08 19:11:26.049 +00:00]
milvus-standalone | [2024/11/08 19:11:36.777 +00:00] [INFO] [datacoord/meta.go:1446] ["UpdateChannelCheckpoint done"] [channel=by-dev-rootcoord-dml_7_452543465012995768v0] [ts=453795665949229058] [time=2024/11/08 19:11:26.049 +00:00]
milvus-standalone | [2024/11/08 19:11:36.777 +00:00] [INFO] [datacoord/meta.go:1446] ["UpdateChannelCheckpoint done"] [channel=by-dev-rootcoord-dml_1_452344796020212245v0] [ts=453795665949229058] [time=2024/11/08 19:11:26.049 +00:00]
milvus-standalone | [2024/11/08 19:11:36.777 +00:00] [INFO] [datacoord/meta.go:1446] ["UpdateChannelCheckpoint done"] [channel=by-dev-rootcoord-dml_3_451658620880510403v0] [ts=453795665949229058] [time=2024/11/08 19:11:26.049 +00:00]
失败的原因可能是什么?
我也遇到同样的问题,请问你解决了吗?