我是 Lucene 和 Java 的新手,正在尝试打开一个我认为已有 8-10 年历史的索引。目录中有四个文件:
姓名 | 尺寸 |
---|---|
_0.cfx | 47,942 KB |
_s.cfs | 1,78,687 KB |
segments.gen | 1 KB |
segments_2 | 1 KB |
Java
import java.util.List;
import java.io.IOException;
import java.nio.file.Paths;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.StoredFields;
import org.apache.lucene.index.IndexableField;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.search.MatchAllDocsQuery;
public class FieldReader {
private static final String INDEX_PATH = "indexedfiles";
public static void main(String[] args) throws Exception {
IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(INDEX_PATH)));
//Index searcher
IndexSearcher searcher = new IndexSearcher(reader);
//This query will match with all documents in the index
Query query = new MatchAllDocsQuery();
//search the index
TopDocs foundDocs = searcher.search(query, 10);
// Returns a StoredFields reader for the stored fields of this index.
StoredFields storedFields = searcher.storedFields();
//Let's print out the path of document files and fields
for (ScoreDoc sd : foundDocs.scoreDocs) {
Document doc = storedFields.document(sd.doc);
System.out.println("Path : " + doc.get("path"));
List<IndexableField> fields = doc.getFields();
for (IndexableField field : fields) {
System.out.println("Name : " + field.name() + ", Type : " + field.fieldType().toString());
}
}
}
}
错误
Exception in thread "main" org.apache.lucene.index.IndexFormatTooOldException:
Format version is not supported
(resource BufferedChecksumIndexInput(MemorySegmentIndexInput(path="D:\IndiaLawLibrary\AHC\s_index\a\segments_2"))): -9 (needs to be between 1071082519 and 1071082519).
This version of Lucene only supports indexes created with release 9.0 and later.
我还尝试了几个旧版本,包括最旧的 3.0.3,但无法读取索引。我只需要索引中的数据,稍后我将使用最新版本创建一个新索引。
如果只是阅读,应该是可能的,假设你的类路径中有 lucene-backwards-codecs。
Directory directory = FSDirectory.open(Paths.get(INDEX_PATH));
IndexCommit commit = DirectoryReader.listCommits(directory).getLast();
DirectoryReader reader = DirectoryReader.open(commit, 0, null);
0
是要检查的最小主要版本。最好知道哪个版本的 Lucene 创建了索引并使用它,但是 0
是一个包罗万象的东西。
如果事情变化太大,则不能保证您可以阅读索引。 Lucene 官方仅支持 N-1 编解码器,但此方法是专门针对这种情况的。令人烦恼的是,没有一个
IndexWriter
相当于在 very 旧书里懒洋洋地阅读,在新书里懒洋洋地写作。