我一直在尝试创建一个非常简单的网络应用程序,它可以读取 epub,并按部分显示每个章节。很简单。我尝试同时使用 epubjs 和 epub-parser,但每次我尝试使用几个 .epub 文件时,它都不会返回任何内容(错误、空等)。我尝试了验证器来确保我的 epub 格式良好等,一切都很好。当我用 WinZip 打开所有 epub 时,它看起来应该如此。
我很绝望,因为我不明白出了什么问题,也不明白为什么我不能简单地阅读 epub。
以下是我为此使用的两个函数。错误从 item.load() 开始,它不返回适当的值/对象。现在我的错误是:
Error extracting chapter content: TypeError: rawContent?.trim is not a function
很明显 rawContent 的格式不正确,但我不知道为什么以及如何修复它。
我还附上了书籍、书脊和元数据的日志,尽管我无法识别出任何错误。
任何帮助或建议将不胜感激。谢谢!
export class EpubService {
private static async parseEpubContent(arrayBuffer: ArrayBuffer): Promise<{
chapters: Chapter[];
title: string;
}> {
try {
const book = ePub(arrayBuffer);
await book.ready;
const spine = await book.loaded.spine;
const metadata = await book.loaded.metadata;
logger.info('book: ', book);
logger.info('spine: ', spine);
logger.info('metadata: ', metadata);
if (!spine || spine.length === 0) {
throw new ProcessingError('No chapters found in EPUB');
}
logger.info('Processing EPUB with spine length:', spine.length);
const chapters: Chapter[] = [];
const maxChapters = Math.min(spine.length, 5);
for (let i = 0; i < maxChapters; i++) {
const item = spine.get(i);
if (!item) {
logger.warn(`No spine item found at index ${i}`);
continue;
}
try {
logger.info(`Processing chapter ${i + 1}/${maxChapters}`);
const content = await item.load();
if (!content) {
logger.warn(`No content loaded for chapter ${i + 1}`);
continue;
}
const { title, content: extractedContent } =
extractChapterContent(content);
chapters.push({
id: item.idref || String(i + 1),
title: title || `Chapter ${i + 1}`,
content: extractedContent,
summary: '',
status: 'pending',
});
logger.info(`Successfully processed chapter ${i + 1}`);
} catch (error) {
logger.error(`Error processing chapter ${i}:`, error);
// Continue with next chapter
continue;
}
}
if (chapters.length === 0) {
throw new ProcessingError('No valid chapters found in EPUB');
}
logger.info(`Successfully extracted ${chapters.length} chapters`);
return {
chapters,
title: metadata?.title || 'Untitled Book',
};
} catch (error) {
logger.error('Error parsing EPUB content:', error);
throw error instanceof ProcessingError
? error
: new ProcessingError('Failed to parse EPUB content');
}
}
static async processEpubFile(
file: File,
signal?: AbortSignal
): Promise<{ book: Book; cleanup: () => Promise<void> }> {
let filePath: string | undefined;
try {
logger.info('Starting EPUB file processing');
// Upload to Supabase
filePath = await StorageService.uploadFile(file);
logger.info('File uploaded to Supabase');
// Download for processing
const arrayBuffer = await StorageService.downloadFile(filePath);
logger.info('File downloaded from Supabase');
// Parse EPUB content
const { chapters, title } = await this.parseEpubContent(arrayBuffer);
const cleanup = async () => {
if (filePath) {
await StorageService.deleteFile(filePath).catch((error) => {
logger.error('Error cleaning up file:', error);
});
}
};
return {
book: { title, chapters },
cleanup,
};
} catch (error) {
// Clean up on error
if (filePath) {
await StorageService.deleteFile(filePath).catch((error) => {
logger.error('Error cleaning up file:', error);
});
}
logger.error('Error processing EPUB:', error);
throw error instanceof ProcessingError
? error
: new ProcessingError('Failed to process EPUB file');
}
}
}
export const extractChapterContent = (rawContent: string): ExtractedContent => {
try {
if (!rawContent?.trim()) {
throw new ProcessingError('Empty raw content provided');
}
const parser = new DOMParser();
const doc = parser.parseFromString(rawContent, 'text/html');
// Check for parsing errors
const parserError = doc.querySelector('parsererror');
if (parserError) {
throw new ProcessingError('Failed to parse HTML content');
}
const title = findTitle(doc);
const content = findContent(doc);
const cleanedContent = cleanContent(content);
validateContent(cleanedContent);
return {
title: title || 'Untitled Chapter',
content: cleanedContent,
};
} catch (error) {
logger.error('Error extracting chapter content:', error);
throw error instanceof ProcessingError
? error
: new ProcessingError('Failed to extract chapter content');
}
};
最后,当您指定书中的章节时,看起来就像在 book 上调用 load() 一样,如下所示: const content = wait book.load(item.href);
希望对某人有帮助。