我想使用 Lucene 建议机制来帮助最终用户找出他何时犯了拼写错误。
Lucene 的
SpellChecker
有一个方法 suggestSimilar
应该接收 SuggestionMode 标志。使用标志 SuggestMode.SUGGEST_MORE_POPULAR
,我希望只提供当前目录中更多存在的单词的建议。
下面的代码似乎不符合这个假设:
import org.apache.lucene.analysis.core.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.spell.LuceneDictionary;
import org.apache.lucene.search.spell.SpellChecker;
import org.apache.lucene.search.spell.SuggestMode;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import java.io.IOException;
import java.util.LinkedList;
import java.util.List;
public class SuggestTest {
static public void main(String args[]) throws IOException {
final String NAME_FIELD = "NAME";
Directory directory = new RAMDirectory();
IndexWriter writer = new IndexWriter(directory,
new IndexWriterConfig(new SimpleAnalyzer()));
writer.deleteAll();
writer.commit();
List<String> list = new LinkedList<>();
for (int i = 0; i < 1000; i++)
list.add("wafa");
list.add("waffa");
for (String name : list) {
Document doc = new Document();
doc.add(new TextField(NAME_FIELD, name, Field.Store.YES));
writer.addDocument(doc);
}
writer.close();
DirectoryReader directoryReader = DirectoryReader.open(directory);
LuceneDictionary nameDictionary = new LuceneDictionary(directoryReader, NAME_FIELD);
IndexWriterConfig config = new IndexWriterConfig(new SimpleAnalyzer());
SpellChecker spellChecker = new SpellChecker(directory);
spellChecker.indexDictionary(nameDictionary, config, true);
for (String s : new String[]{"wafa", "waffa", "wala"}) {
String suggestions[] = spellChecker.suggestSimilar(s, 10, null, null, SuggestMode.SUGGEST_MORE_POPULAR);
System.out.println("Suggestions for " + s);
for (String suggestion : suggestions)
System.out.println(" -" + suggestion);
}
}
}
当我寻找
Waffa
(目录中出现了 1000 次!)时,我不希望以下代码会建议我
Wafa
您想要调整代码以使用 SUGGEST_MORE_POPULAR 模式。
String suggestions[] = spellChecker.suggestSimilar(s, 10, directoryReader, NAME_FIELD, SuggestMode.SUGGEST_MORE_POPULAR);