使用最新版本 10.0.0 读取相当旧的 Apache Lucene 索引失败

问题描述 投票:0回答:1
  • Windows 10
  • Lucene 10.0.0
  • JDK 23.0

我是 Lucene 和 Java 的新手,正在尝试打开一个我认为已有 8-10 年历史的索引。目录中有四个文件:

姓名 尺寸
_0.cfx 47,942 KB
_s.cfs 1,78,687 KB
segments.gen 1 KB
segments_2 1 KB

Java

import java.util.List;
import java.io.IOException;
import java.nio.file.Paths;

import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.StoredFields;
import org.apache.lucene.index.IndexableField;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.search.MatchAllDocsQuery;

public class FieldReader {

    private static final String INDEX_PATH = "indexedfiles";

    public static void main(String[] args) throws Exception {
        IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(INDEX_PATH)));
        //Index searcher
        IndexSearcher searcher = new IndexSearcher(reader);
        //This query will match with all documents in the index
        Query query = new MatchAllDocsQuery();
        //search the index
        TopDocs foundDocs = searcher.search(query, 10);
        // Returns a StoredFields reader for the stored fields of this index. 
        StoredFields storedFields = searcher.storedFields();

        //Let's print out the path of document files and fields
        for (ScoreDoc sd : foundDocs.scoreDocs) {
          Document doc = storedFields.document(sd.doc);
          System.out.println("Path : " + doc.get("path"));
          List<IndexableField> fields = doc.getFields();
            for (IndexableField field : fields) {
                System.out.println("Name : " + field.name() + ", Type : " + field.fieldType().toString());
            }
        }
    }
}

错误

Exception in thread "main" org.apache.lucene.index.IndexFormatTooOldException:
Format version is not supported
(resource BufferedChecksumIndexInput(MemorySegmentIndexInput(path="D:\IndiaLawLibrary\AHC\s_index\a\segments_2"))): -9 (needs to be between 1071082519 and 1071082519). 
This version of Lucene only supports indexes created with release 9.0 and later.

我还尝试了几个旧版本,包括最旧的 3.0.3,但无法读取索引。我只需要索引中的数据,稍后我将使用最新版本创建一个新索引。

lucene
1个回答
0
投票

如果只是阅读,应该是可能的,假设你的类路径中有 lucene-backwards-codecs。

Directory directory = FSDirectory.open(Paths.get(INDEX_PATH));
IndexCommit commit = DirectoryReader.listCommits(directory).getLast();
DirectoryReader reader = DirectoryReader.open(commit, 0, null);

0
是要检查的最小主要版本。最好知道哪个版本的 Lucene 创建了索引并使用它,但是
0
是一个包罗万象的东西。

如果事情变化太大,则不能保证您可以阅读索引。 Lucene 官方仅支持 N-1 编解码器,但此方法是专门针对这种情况的。令人烦恼的是,没有一个

IndexWriter
相当于在 very 旧书里懒洋洋地阅读,在新书里懒洋洋地写作。

https://lucene.apache.org/core/10_0_0/core/org/apache/lucene/index/DirectoryReader.html#listCommits(org.apache.lucene.store.Directory)

https://lucene.apache.org/core/10_0_0/core/org/apache/lucene/index/DirectoryReader.html#open(org.apache.lucene.index.IndexCommit,int,java.util.Comparator)

© www.soinside.com 2019 - 2024. All rights reserved.