以下内容返回所有包含短语“Comp and Comp”和“低威胁”的文档
{ $text: { $search: '"Comp and Comp" "Low Threat"' } }
同理,如何对多个短语进行 OR 文本搜索?
我能得到的最近的信息如下:
{ $text: { $search: 'Comp and Comp Low Threat' } }
但它没有办法对短语进行分组,并且仅对单个字符串执行 OR 操作。
遗憾的是,由于
$text
限制,您必须执行两个单独的调用
一个查询最多可以指定一个$text表达式。
如果 $search 字符串包含短语和单个术语,则文本搜索将仅匹配包含该短语的文档。
本质上,两个短语在它们之间执行
$and
逻辑,并且因为您只能执行单个 $text
查询,所以需要将其拆分为 2 个调用。
这是该问题的解决方案(对术语和短语列表进行OR搜索)(PyMongo)。
def search_terms_or(collection, terms, projection={}):
"""
Perform an OR search over a list of terms, which can be unigrams (single words) or phrases (multiple words),
combining the results while avoiding duplicates.
Parameters:
- terms (list of str): A list of search terms, which can be unigrams or phrases (with spaces).
- projection (dict, optional): MongoDB projection specification to control which fields to return.
Default is an empty dict, which returns all fields.
Returns:
- list: Combined list of unique documents matching any of the search terms.
"""
# split terms into unigrams and phrases
unigrams = [term for term in terms if " " not in term]
phrases = [term for term in terms if " " in term]
results = []
seen_ids = set()
# search for unigrams, if any
if unigrams:
unigram_search_string = " ".join(unigrams)
unigram_results = collection.find({"$text": {"$search": unigram_search_string}}, projection)
results = list(unigram_results)
seen_ids = {doc['_id'] for doc in results}
# search for each bigram and add new results
for phrase in phrases:
phrase_results = collection.find({"$text": {"$search": f'\"{phrase}\"'}}, projection)
for doc in phrase_results:
if doc['_id'] not in seen_ids:
results.append(doc)
seen_ids.add(doc['_id'])
return results
然后调用
search_terms_or(myCollection, ["Comp and Comp", "Low Threat"])
会给你你想要的结果。