我有 RAG,正在尝试按关键字/短语实现过滤,如下所示:
public SearchOptions? CreateSearchOptions( int searchTypeInt,
int k,
ReadOnlyMemory<float> embeddings,
ReadOnlyMemory<float> namedEntitiesEmbeddings,
string filter,
FilterAction filterAction)
{
_logger.LogInformation("CreateSearchOptions entered");
SearchOptions? searchOptions = null;
try
{
SearchType searchType = (SearchType)searchTypeInt;
System.FormattableString formattableStr = $"SegmentText ct '{filter}'";
if (!String.IsNullOrWhiteSpace(filter))
{
if (filterAction == FilterAction.Include)
{
formattableStr = $"search.ismatch({filter}, 'SegmentText')";
}
else if (filterAction == FilterAction.Exclude)
{
formattableStr = $"NOT(search.ismatch({filter}, 'SegmentText'))";
}
}
searchOptions = new SearchOptions
{
//Filter = filter, will be set later
Size = k,
// fields to retrieve, if not specified then all are retrieved if retrievable
Select = { "SegmentText", "NamedEntities", "docId", "segmentId", "Source", "TimeSrcModified", "TimeSrcCreated", "TimeIngested" },
//SearchMode = SearchMode.Any, TBD!!!
Filter = SearchFilter.Create(formattableStr)
};
if ((searchType & SearchType.Vector) == SearchType.Vector)
{
searchOptions.VectorSearch = new VectorSearchOptions();
VectorizedQuery vq = new VectorizedQuery(embeddings) { KNearestNeighborsCount = k, Fields = { "SegmentTextVector" } };
searchOptions.VectorSearch.Queries.Add(vq);
if (namedEntitiesEmbeddings.Length > 0)
{
vq = new VectorizedQuery(namedEntitiesEmbeddings) { KNearestNeighborsCount = k, Fields = { "SegmentNamedEntitiesVector" } };
searchOptions.VectorSearch.Queries.Add(vq);
}
}
}
catch (Exception ex)
{
_logger.LogError(ex, ex.Message);
return null;
}
return searchOptions;
}
问题是我的“文档”实际上是文档的块,长度为 500-700 个令牌。矢量搜索从构成整个文件的 11 个块中返回 5 个相关块。在我的测试用例中,这是我的简历。它工作正常,但添加“包含”过滤器并没有多大作用。如果用户提示是:开发人员在其职业生涯中从事过哪些项目”,并且我将过滤器设置为“Outlook”以表明我想要与 MS Outlook 相关的项目列表,它仍然为我提供各种项目,而不仅仅是 Outlook 相关.因为我将向量搜索的 5 个结果传递到 OpenAI Completion API 中,并且这些块还包括 Outlook 之外的一些其他项目。那么解决方案是什么?(除了特别询问“仅列出该开发人员的 Outlook 项目之外,我在这里谈论过滤器”工作”)
我不确定“SearchFilter.Create()”函数的作用,但假设它不重写输入字符串,则
中使用的“ct”运算符$"SegmentText ct '{filter}'"
不存在。过滤器语言记录在此处:https://learn.microsoft.com/en-us/azure/search/search-query-odata-filter