Document document = new Document();
document.add(new Field("ID", "100", Field.Store.YES, Field.Index.NOT_ANALYZED));
document.add(new Field("TEMPLATE_CONTENT", "dummy Just {#var#} testing a spaceless {#var#} setup dummy",
Field.Store.YES, Field.Index.ANALYZED));
writer.addDocument(document);
am 索引dummy 只是使用 lucene {#var#} 测试无空格的 {#var#} 设置虚拟,同时使用下面的无空格句子进行查询
dummyJustatestingaspacelessfreakingsetupdummy
or
假人只是刺痛无空间的怪物设置假人
无法获得与上述 TEMPLATE_CONTENT 的单个匹配项
使用以下代码进行搜索
query = new QueryParser(Version.LUCENE_36, "TEMPLATE_CONTENT", new StandardAnalyzer(Version.LUCENE_36))
.parse(serchQuery);
searcher = new IndexSearcher(index, true);
System.out.println("......query : " + query + "\n");
long startTime = System.currentTimeMillis();
results = searcher.search(query, 2);
long endTime = System.currentTimeMillis();
System.out.println("results time taken" + (endTime - startTime) + " ms");
for (ScoreDoc scoreDoc : results.scoreDocs) {
System.out.println("scoreDoc : " + scoreDoc);
Document document = searcher.doc(scoreDoc.doc);
System.out.println("Found match: " + document.get("TEMPLATE_CONTENT") + "\n");}
请帮我至少获得一场比赛
您可以按照这个方法看看是否有帮助?
为了确保在搜索过程中能够匹配无空格句子,您需要以保留无空格格式的方式对文本进行分析和索引。实现此目的的一种方法是使用不会对空白进行标记的自定义分析器。
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.core.LowerCaseFilter;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
public class SpacelessAnalyzer extends Analyzer {
@Override
protected TokenStreamComponents createComponents(String fieldName) {
Tokenizer tokenizer = new SpacelessTokenizer();
TokenStream filter = new LowerCaseFilter(tokenizer);
return new TokenStreamComponents(tokenizer, filter);
}
private static class SpacelessTokenizer extends Tokenizer {
private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);
@Override
public boolean incrementToken() {
clearAttributes();
try {
// Read the entire input as a single token
char[] buffer = new char[256];
int length = input.read(buffer);
if (length > 0) {
termAtt.append(buffer, 0, length);
return true;
}
} catch (Exception e) {
// handle catch
}
return false;
}
}
}
现在您可以在索引文档时使用分析器:
Analyzer analyzer = new SpacelessAnalyzer();
document.add(new Field("TEMPLATE_CONTENT", "dummy Just {#var#} testing a spaceless {#var#} setup dummy",
Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES));
搜索时:
QueryParser queryParser = new QueryParser(Version.LUCENE_36, "TEMPLATE_CONTENT", new SpacelessAnalyzer());
Query query = queryParser.parse(searchQuery);
有了这个,您现在应该能够索引和搜索无空格的句子