如何使用无空格查询进行Lucene搜索?

问题描述 投票:0回答:1
Document document = new Document();
document.add(new Field("ID", "100", Field.Store.YES, Field.Index.NOT_ANALYZED));
document.add(new Field("TEMPLATE_CONTENT", "dummy Just {#var#} testing a spaceless {#var#} setup dummy",
                Field.Store.YES, Field.Index.ANALYZED));
writer.addDocument(document);

am 索引dummy 只是使用 lucene {#var#} 测试无空格的 {#var#} 设置虚拟,同时使用下面的无空格句子进行查询

dummyJustatestingaspacelessfreakingsetupdummy

                      or 

假人只是刺痛无空间的怪物设置假人

无法获得与上述 TEMPLATE_CONTENT 的单个匹配项

使用以下代码进行搜索

        query = new QueryParser(Version.LUCENE_36, "TEMPLATE_CONTENT", new StandardAnalyzer(Version.LUCENE_36))
                .parse(serchQuery);
        searcher = new IndexSearcher(index, true);
        System.out.println("......query : " + query + "\n");
        long startTime = System.currentTimeMillis();
        results = searcher.search(query, 2);
        long endTime = System.currentTimeMillis();
        System.out.println("results time taken" + (endTime - startTime) + " ms");
        for (ScoreDoc scoreDoc : results.scoreDocs) {
            System.out.println("scoreDoc : " + scoreDoc);
            Document document = searcher.doc(scoreDoc.doc);
            System.out.println("Found match: " + document.get("TEMPLATE_CONTENT") + "\n");}

请帮我至少获得一场比赛

lucene solar
1个回答
0
投票

您可以按照这个方法看看是否有帮助?

为了确保在搜索过程中能够匹配无空格句子,您需要以保留无空格格式的方式对文本进行分析和索引。实现此目的的一种方法是使用不会对空白进行标记的自定义分析器。

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.core.LowerCaseFilter;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

public class SpacelessAnalyzer extends Analyzer {
    @Override
    protected TokenStreamComponents createComponents(String fieldName) {
        Tokenizer tokenizer = new SpacelessTokenizer();
        TokenStream filter = new LowerCaseFilter(tokenizer);
        return new TokenStreamComponents(tokenizer, filter);
    }

    private static class SpacelessTokenizer extends Tokenizer {
        private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);

        @Override
        public boolean incrementToken() {
            clearAttributes();
            try {
                // Read the entire input as a single token
                char[] buffer = new char[256];
                int length = input.read(buffer);
                if (length > 0) {
                    termAtt.append(buffer, 0, length);
                    return true;
                }
            } catch (Exception e) {
                // handle catch
            }
            return false;
        }
    }
}

现在您可以在索引文档时使用分析器:

Analyzer analyzer = new SpacelessAnalyzer();
document.add(new Field("TEMPLATE_CONTENT", "dummy Just {#var#} testing a spaceless {#var#} setup dummy",
                Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES));

搜索时:

QueryParser queryParser = new QueryParser(Version.LUCENE_36, "TEMPLATE_CONTENT", new SpacelessAnalyzer());
Query query = queryParser.parse(searchQuery);

有了这个,您现在应该能够索引和搜索无空格的句子

© www.soinside.com 2019 - 2024. All rights reserved.