pylucene 模糊搜索即使使用相同的搜索词也不会返回任何内容

问题描述 投票:0回答:1

我正在尝试在 pylucene 中构建一个模糊搜索查询,但即使使用已经上传的相同术语,它也不会返回任何内容。我试过将字段上传为 StringField 或 TextField,甚至使用自定义字段并更改 maxEdits 选项。并且用小词它可以工作,例如在波纹管代码

fuzzy_query = 'fox'
中设置,但
fuzzy_query = 'brown fox'
fuzzy_query = 'The brown fox'
都不会返回任何东西。

import lucene
from org.apache.lucene.store import NIOFSDirectory
from org.apache.lucene.analysis.standard import StandardAnalyzer
from org.apache.lucene.document import Document, Field, FieldType
from org.apache.lucene.index import IndexWriter, IndexWriterConfig
from org.apache.lucene.search import IndexSearcher, FuzzyQuery
from java.nio.file import Paths
from org.apache.lucene.index import IndexOptions
from org.apache.lucene.index import DirectoryReader
from org.apache.lucene.index import Term
from org.apache.lucene.search import IndexSearcher, TermQuery

lucene.initVM(vmargs=['-Djava.awt.headless=true'])

my_path = "../index"

# create index writer
analyzer = StandardAnalyzer()
config = IndexWriterConfig(analyzer)
index_dir = NIOFSDirectory(Paths.get(my_path))
writer = IndexWriter(index_dir, config)

# define fuzzy field
field_type = FieldType()
field_type.setStored(True)
field_type.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS)
field_type.setTokenized(True)
field_type.setStoreTermVectors(True)
field_type.setStoreTermVectorPositions(True)
field_type.setStoreTermVectorOffsets(True)
field_type.setStoreTermVectorPayloads(True)

# add documents to index with fuzzy field
doc = Document()
doc.add(Field("title_fuzzy", "The brown fox", field_type))
writer.addDocument(doc)

doc = Document()
doc.add(Field("title_fuzzy", "jumps over the lazy dog", field_type))
writer.addDocument(doc)

# commit changes
writer.commit()
writer.close()

directory = NIOFSDirectory(Paths.get(my_path))

# create an IndexReader and IndexSearcher
reader = DirectoryReader.open(directory)
searcher = IndexSearcher(reader)
# search for documents with fuzzy field

fuzzy_term = "The brown fox"

fuzzy_query = FuzzyQuery(Term("title_fuzzy", fuzzy_term), maxEdits=2)

hits = searcher.search(fuzzy_query, 1).scoreDocs
for hit in hits:
    doc = searcher.doc(hit.doc)
    print("Document: ", doc)

提前致谢!

python java lucene fuzzy-search pylucene
1个回答
0
投票

FuzzyQuery 采用一个 term 作为构造函数。一个术语代表文本中的一个词。

© www.soinside.com 2019 - 2024. All rights reserved.