如何使用java从Microsoft文档创建预览图像

Question

目前，我正在处理 Microsoft 文档：Word（doc、docx）、Powerpoint（ppt、pptx）和 Excel（xls、xlsx）

我想从第一页创建预览图像。

Apache-poi 库只能完成 PowerPoint 文档。

但我找不到其他类型的解决方案。

我有一个想法，将文档转换为 pdf (1) 并转换为图像 (2) 。

对于步骤 2（将 pdf 转换为图像），有许多免费的 java 库，例如PDF 框。它与我的虚拟 pdf 文件配合得很好

但是，我在步骤1中遇到了问题

在我的文档中，它可能包含具有多种样式的文本、表格、图像或对象。 Word 文档第一页的示例图像：

哪个开源java库可以完成这个任务？

我尝试使用以下库来实现：

JODConverter - 输出看起来不错，但需要 OpenOffice。

docx4j - 我不确定它是否可以使用非 ooxml 格式（doc、xls、ppt）并且真的免费吗？以下是示例代码：

String inputWordPath = "C:\\Users\\test\\Desktop\\TestPDF\\Docx.docx";
String outputPDFPath = "C:\\Users\\test\\Desktop\\TestPDF\\OutDocx4j.pdf";
try {
    InputStream is = new FileInputStream(new File(inputWordPath));
    WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(is);
    Mapper fontMapper = new IdentityPlusMapper();
    wordMLPackage.setFontMapper(fontMapper);
    Docx4J.toPDF(wordMLPackage, new FileOutputStream(new File(outputPDFPath)));
} catch (Exception e) {
    e.printStackTrace();
}

输出看起来不错，但在生成的 pdf 中包含“## 评估仅使用 ##”。

xdocreport - 生成的 pdf 不包含图像。

String inputWordPath = "C:\\Users\\test\\Desktop\\TestPDF\\Docx.docx";
String outputPDFPath = "C:\\Users\\test\\Desktop\\TestPDF\\OutXDOCReport.pdf";
InputStream is = new FileInputStream(new File(inputWordPath));
XWPFDocument document = new XWPFDocument(is);
PdfOptions options = PdfOptions.create();
OutputStream out = new FileOutputStream(new File(outputPDFPath));
PdfConverter.getInstance().convert(document, out, options);

我找不到适合该任务的库。

你有什么建议吗？
我可以直接将文档（docx、doc、xlsx、xls）转换为图像吗？
docx4j 的转换功能真的免费吗？
如何从生成的pdf（由docx4j）中删除“##评估仅使用##”？
docx4j 可以处理非 ooxml 文档吗？
我可以只将第一页转换为 pdf 吗？
我可以设置 pdf 的大小以适应转换后的文档内容吗？
是否有任何库和示例代码可以将文档转换为pdf或将文档转换为图像？

Answer 1

如果您有能力安装 LibreOffice（或 Apache OpenOffice），JODConverter 应该可以很好地完成任务（并且免费）。

请注意，Maven 中央存储库中提供的 JODConverter 的最新版本提供了一项称为 Filters 的功能，它允许您轻松地仅转换第一页，并且它支持开箱即用地转换为 PNG。以下是有关如何执行此操作的快速示例：

// Create an office manager using the default configuration.
// The default port is 2002. Note that when an office manager
// is installed, it will be the one used by default when
// a converter is created.
final LocalOfficeManager officeManager = LocalOfficeManager.install(); 
try {

    // Start an office process and connect to the started instance (on port 2002).
    officeManager.start();

    final File inputFile = new File("document.docx");
    final File outputFile = new File("document.png");

    // Create a page selector filter in order to
    // convert only the first page.
    final PageSelectorFilter selectorFilter = new PageSelectorFilter(1);

    LocalConverter
      .builder()
      .filterChain(selectorFilter)
      .build()
      .convert(inputFile)
      .to(outputFile)
      .execute();
} finally {
    // Stop the office process
    LocalOfficeUtils.stopQuietly(officeManager);
}

至于你的问题

我可以设置 pdf 的大小以适应转换后的文档内容吗

如果您可以在不使用 JODConverter 的情况下使用 LibreOffice 或 Apache OpenOffice 来完成此操作，那么您也可以使用 JODConverter 来完成此操作。您只需找出如何以编程方式完成它，然后创建一个过滤器以与 JODConverter 一起使用。

我不会在这里详细介绍，因为您可以选择其他方式，但如果您需要进一步的帮助，只需在项目的Gitter Community上询问即可。

Answer 2

您可以尝试GroupDocs.Conversion Cloud SDK for Java，其免费套餐计划每月提供 50 个免费积分。它支持所有常见文件格式的转换。

DOCX 到图像流转换代码示例：

// Get App Key and App SID from https://dashboard.groupdocs.cloud/
ConvertApi apiInstance = new ConvertApi(AppSID,AppKey);
try {

    ConvertSettings settings = new ConvertSettings();

    settings.setStorageName(Utils.MYStorage);
    settings.setFilePath("conversions\\password-protected.docx");
    settings.setFormat("jpeg");

    DocxLoadOptions loadOptions = new DocxLoadOptions();
    loadOptions.setPassword("password");
    loadOptions.setHideWordTrackedChanges(true);
    loadOptions.setDefaultFont("Arial");

    settings.setLoadOptions(loadOptions);

    JpegConvertOptions convertOptions = new JpegConvertOptions();
    convertOptions.setFromPage(1);
    convertOptions.setPagesCount(1);
    convertOptions.setGrayscale(false);
    convertOptions.setHeight(1024);
    convertOptions.setQuality(100);
    convertOptions.setRotateAngle(90);
    convertOptions.setUsePdf(false);
    settings.setConvertOptions(convertOptions);

    // set OutputPath as empty will result the output as document IOStream
    settings.setOutputPath("");

    // convert to specified format
    File response = apiInstance.convertDocumentDownload(new ConvertDocumentRequest(settings));
    System.out.println("Document converted successfully: " + response.length());
} catch (ApiException e) {
    System.err.println("Exception while calling ConvertApi:");
    e.printStackTrace();
}

我是 Aspose 的开发者布道者。

Answer 3

@sbraconnier 在新版本中的解决方案，直接在内存中处理：

import org.jodconverter.core.document.DefaultDocumentFormatRegistry;
import org.jodconverter.core.office.OfficeException;
import org.jodconverter.local.LocalConverter;
import org.jodconverter.local.office.LocalOfficeManager;
import org.jodconverter.local.filter.PagesSelectorFilter;

import java.io.ByteArrayOutputStream;
import java.io.InputStream;

public class Office {
    // Create an office manager using the default configuration.
    // The default port is 2002. Note that when an office manager
    // is installed, it will be the one used by default when
    // a converter is created.
    final public static LocalOfficeManager officeManager = LocalOfficeManager.install();
    static{
        // Start an office process and connect to the started instance (on port 2002).
        try {
            officeManager.start();
            Runtime.getRuntime().addShutdownHook(new Thread(() -> {
                try {
                    officeManager.stop();
                } catch (OfficeException e) {
                    //AL.warn(e);
                }
            }));
        } catch (OfficeException e) {
            //AL.warn(e);
        }
    }

    /**
     * @param inputFile document.docx
     * @return document.png preview image bytes.
     */
    public static byte[] createPreview(InputStream inputFile) throws OfficeException {
        final ByteArrayOutputStream outputFile = new ByteArrayOutputStream();

        // Create a page selector filter in order to
        // convert only the first page.
        final PagesSelectorFilter selectorFilter = new PagesSelectorFilter(1);

        LocalConverter
                .builder()
                .filterChain(selectorFilter)
                .build()
                .convert(inputFile)
                .to(outputFile)
                .as(DefaultDocumentFormatRegistry.PNG)
                .execute();
        return outputFile.toByteArray();
    }
}

如何使用java从Microsoft文档创建预览图像

问题描述投票：0回答：3

3个回答

最新问题

如何使用java从Microsoft文档创建预览图像

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3