使用 epubjs 或 epub-parser 读取 epub 总是返回空

问题描述 投票:0回答:1

我一直在尝试创建一个非常简单的网络应用程序,它可以读取 epub,并按部分显示每个章节。很简单。我尝试同时使用 epubjs 和 epub-parser,但每次我尝试使用几个 .epub 文件时,它都不会返回任何内容(错误、空等)。我尝试了验证器来确保我的 epub 格式良好等,一切都很好。当我用 WinZip 打开所有 epub 时,它看起来应该如此。

我很绝望,因为我不明白出了什么问题,也不明白为什么我不能简单地阅读 epub。

以下是我为此使用的两个函数。错误从 item.load() 开始,它不返回适当的值/对象。现在我的错误是:

Error extracting chapter content: TypeError: rawContent?.trim is not a function

很明显 rawContent 的格式不正确,但我不知道为什么以及如何修复它。

我还附上了书籍、书脊和元数据的日志,尽管我无法识别出任何错误。

任何帮助或建议将不胜感激。谢谢!

书籍书脊元数据

export class EpubService {
  private static async parseEpubContent(arrayBuffer: ArrayBuffer): Promise<{
    chapters: Chapter[];
    title: string;
  }> {
    try {
      const book = ePub(arrayBuffer);
      await book.ready;

      const spine = await book.loaded.spine;
      const metadata = await book.loaded.metadata;

      logger.info('book: ', book);
      logger.info('spine: ', spine);
      logger.info('metadata: ', metadata);

      if (!spine || spine.length === 0) {
        throw new ProcessingError('No chapters found in EPUB');
      }

      logger.info('Processing EPUB with spine length:', spine.length);

      const chapters: Chapter[] = [];
      const maxChapters = Math.min(spine.length, 5);

      for (let i = 0; i < maxChapters; i++) {
        const item = spine.get(i);
        if (!item) {
          logger.warn(`No spine item found at index ${i}`);
          continue;
        }

        try {
          logger.info(`Processing chapter ${i + 1}/${maxChapters}`);

          const content = await item.load();
          if (!content) {
            logger.warn(`No content loaded for chapter ${i + 1}`);
            continue;
          }

          const { title, content: extractedContent } =
            extractChapterContent(content);

          chapters.push({
            id: item.idref || String(i + 1),
            title: title || `Chapter ${i + 1}`,
            content: extractedContent,
            summary: '',
            status: 'pending',
          });

          logger.info(`Successfully processed chapter ${i + 1}`);
        } catch (error) {
          logger.error(`Error processing chapter ${i}:`, error);
          // Continue with next chapter
          continue;
        }
      }

      if (chapters.length === 0) {
        throw new ProcessingError('No valid chapters found in EPUB');
      }

      logger.info(`Successfully extracted ${chapters.length} chapters`);

      return {
        chapters,
        title: metadata?.title || 'Untitled Book',
      };
    } catch (error) {
      logger.error('Error parsing EPUB content:', error);
      throw error instanceof ProcessingError
        ? error
        : new ProcessingError('Failed to parse EPUB content');
    }
  }

  static async processEpubFile(
    file: File,
    signal?: AbortSignal
  ): Promise<{ book: Book; cleanup: () => Promise<void> }> {
    let filePath: string | undefined;

    try {
      logger.info('Starting EPUB file processing');

      // Upload to Supabase
      filePath = await StorageService.uploadFile(file);
      logger.info('File uploaded to Supabase');

      // Download for processing
      const arrayBuffer = await StorageService.downloadFile(filePath);
      logger.info('File downloaded from Supabase');

      // Parse EPUB content
      const { chapters, title } = await this.parseEpubContent(arrayBuffer);

      const cleanup = async () => {
        if (filePath) {
          await StorageService.deleteFile(filePath).catch((error) => {
            logger.error('Error cleaning up file:', error);
          });
        }
      };

      return {
        book: { title, chapters },
        cleanup,
      };
    } catch (error) {
      // Clean up on error
      if (filePath) {
        await StorageService.deleteFile(filePath).catch((error) => {
          logger.error('Error cleaning up file:', error);
        });
      }

      logger.error('Error processing EPUB:', error);
      throw error instanceof ProcessingError
        ? error
        : new ProcessingError('Failed to process EPUB file');
    }
  }
}
export const extractChapterContent = (rawContent: string): ExtractedContent => {
  try {
    if (!rawContent?.trim()) {
      throw new ProcessingError('Empty raw content provided');
    }

    const parser = new DOMParser();
    const doc = parser.parseFromString(rawContent, 'text/html');

    // Check for parsing errors
    const parserError = doc.querySelector('parsererror');
    if (parserError) {
      throw new ProcessingError('Failed to parse HTML content');
    }

    const title = findTitle(doc);
    const content = findContent(doc);
    const cleanedContent = cleanContent(content);

    validateContent(cleanedContent);

    return {
      title: title || 'Untitled Chapter',
      content: cleanedContent,
    };
  } catch (error) {
    logger.error('Error extracting chapter content:', error);
    throw error instanceof ProcessingError
      ? error
      : new ProcessingError('Failed to extract chapter content');
  }
};
web-applications epub epub.js
1个回答
0
投票

最后,当您指定书中的章节时,看起来就像在 book 上调用 load() 一样,如下所示: const content = wait book.load(item.href);

希望对某人有帮助。

© www.soinside.com 2019 - 2024. All rights reserved.