如何使用Solr从子文档中搜索父文档，同时显示相应子文档的分数？

Question

我正在使用 Solr 进行嵌套文档搜索，根据子文档的向量距离获取相应的父文档。到目前为止，一切都很好。但是，我想在父文档的

[child]

字段中显示相应子文档的向量距离分数，但所有子文档的分数都显示为 0。

我花了几天时间搜索资源，包括 Solr 参考指南，但仍然没有找到我需要的答案。这是我的 Solr 中嵌套文档的示例：

{
 id: 1,
 doc_type: "parent",
 text: "sentence A. sentence B. sentence C."
 children{[
  {id: 2,doc_type: "child",text_piece: "sentence A.", embedding:[0.1,0.2,0.3...]},
  {id: 3,doc_type: "child",text_piece: "sentence B.", embedding:[...]},
  {id: 4,doc_type: "child",text_piece: "sentence C.", embedding:[...]}
]},
{
 id: 5,
 doc_type: "parent",
 text: "sentence D. sentence E. "
 children{[
  {id: 6,doc_type: "child",text_piece: "sentence D.", embedding:[...]},
  {id: 7,doc_type: "child",text_piece: "sentence E.", embedding:[...]},
]}
}

我的查询大致如下：

q={!parent which='doc_type:parent' score=max}{!knn f=embedding topK=2}[0.2,0.1,-0.5,...]&fl=['*','score','[child]',[explain style=nl]']

翻译自pysolr代码，如有语法错误请忽略

结果如下：

docs: [
{
 id: 1, 
 doc_type: "parent",
 text:"sentence A. sentence B. sentence C. ",
 split_text:[
 {id:2,text_piece:"sentence A.", score:0.0, embedding:[...], [explain]: {'match': False, 'value': 0.0, 'description': 'Not a match'}},
 {id:3,text_piece:"sentence B.", score:0.0, embedding:[...], [explain]: {'match': False, 'value': 0.0, 'description': 'Not a match'}},
 {id:4,text_piece:"sentence C.", score:0.0, embedding:[...], [explain]: {'match': False, 'value': 0.0, 'description': 'Not a match'}}]
 [explain]: {'match': True, 'value': 0.849, 'description': 'Score based on 2 child docs in range from  129 to 130, using score mode Max', 'details': [{'match': True, 'value': 0.849,'description': 'within top 2'}]}
]
'Score based on 2 child docs in range from 129 to 130, using score mode Max'

上面的结果只是一个例子。假设id=2的子文档的分数是0.849，id=3的子文档的分数是0.5，我希望这两个分数能够显示在各自的分数字段中。

目前，我对子文档和父文档执行两步搜索，然后合并结果。虽然通过这种折衷的方法我可以得到我需要的结果，但是它伴随着性能和代码可读性的下降。如果您有更好的方法，请告诉我。谢谢。

Answer 1

使用子查询。

对于每个 child_documents，添加一个名为“parentId”的新字段，并填写父文档的 ID。因此，在这种情况下，所有 child_documents 的parentId 均为 1。

然后，不要使用 [child]，而是使用 [subquery]。

这是一个示例，假设您已经将parentId 添加到了children。

your fl will be "*, someName:[subquery]"
and then as raw arguments, add "&someName.q={!term f=parentId v='$row.id'} {!knn f=embedding topK=2}[0.2,0.1,-0.5,...]&someName.fl=*, score"

someName.q的第一部分，

{!term f=parentId v='$row.id'}

就像在fl中添加[child]一样，someName.q的第二部分，

{!knn f=embedding topK=2}[0.2,0.1,-0.5,...]

是一个SHOULD布尔运算符，这在solr中意味着，没关系如果不是匹配，但匹配的记录应该有更高的分数。

然后在 someName.fl 中，您有子项的

score

参数。

您可以将 someName 更改为您想要的任何名称。

这确实改变了 solr 返回子文档的方式。您还会看到子文档的 numFound。

如何使用Solr从子文档中搜索父文档，同时显示相应子文档的分数？

问题描述投票：0回答：1

1个回答

最新问题

如何使用Solr从子文档中搜索父文档，同时显示相应子文档的分数？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1