我正在使用弹性搜索来索引电子商务零售数据以获取类似产品/匹配产品,我正在使用 openai CLIP 从文本/图像生成密集向量,但查询速度非常慢,我正在使用 Elastic Search 8
这是我在 ES 数据库中创建密集向量的方法
'image_vector' => [
'type' => 'dense_vector',
'dims' => 512,
'index' => true,
'similarity' => 'cosine',
"index_options"=> [
"type"=> "hnsw",
"m"=> 16,
"ef_construction"=> 100
]
],
'text_vector' => [
'type' => 'dense_vector',
'dims' => 512,
'index' => true,
'similarity' => 'cosine',
"index_options"=> [
"type"=> "hnsw",
"m"=> 16,
"ef_construction"=> 100
]
],
这是我如何创建应该条件
$should[] = [
'script_score' => [
'query' => [
'bool' => [
'filter' => [
// Only consider documents with image_vector
['exists' => ['field' => 'image_vector']]
]
]
],
'script' => [
'source' => "
double similarity = doc['image_vector'].size() > 0 ? cosineSimilarity(params.query_vector, 'image_vector') : 0;
return similarity > 0.5 ? (similarity + 1.0) * 1000.0 : 0;
",
'params' => ['query_vector' => $vector['vector'][0]]
]
]
];
// Add script_score for text_vector similarity
$should[] = [
'script_score' => [
'query' => [
'bool' => [
'filter' => [
// Only consider documents with text_vector
['exists' => ['field' => 'text_vector']]
]
]
],
'script' => [
'source' => "
double similarity = doc['text_vector'].size() > 0 ? cosineSimilarity(params.query_vector, 'text_vector') : 0;
return similarity > 0.5 ? (similarity + 1.0) * 500.0 : 0;
",
'params' => ['query_vector' => $vector['vector_text'][0]]
]
]
];
}
Elasticsearch 支持两种 kNN 搜索方法:
精确搜索成本太高。请改用 kNN 搜索。
我在下面分享一个 kNN 搜索示例。
POST image-index/_search
{
"knn": {
"field": "image-vector",
"query_vector": [-5, 9, -12],
"k": 10,
"num_candidates": 100
},
"fields": [ "title", "file-type" ]
}
图片搜索可以阅读以下文章。 https://www.elastic.co/search-labs/blog/implement-image-similarity-search-elastic