PDFBox 3.2:PDF/UA 中图像标签的 PAC 中缺少边界框错误

问题描述 投票:0回答:1

我正在尝试使用 PDFBox 3.2 创建 PDF/UA 文档,并且我遵循了 @Tilman Hausherr 在此 Stack Overflow 帖子中建议的解决方案。我成功地标记了文本元素和图像,并且图像在 PAC 中显示正确标记。但是,我在 PAC 中仍然收到错误,指示图像缺少边界框。

这是我迄今为止为解决此问题所做的尝试:

Marked Content for Rectangle: I created marked content for the image’s rectangle and added it to the document. (No success, as the error persisted).
Adding COSName.BBOX to Figure Structure Element: I added a new item COSName.BBOX with a Rectangle(x, y, width, height) to the figure structure element. (Resulted in a corrupted PDF).
Adding COSName.BBOX to Figure Reference: I added a new item with COSName.BBOX in the figure reference, similar to step 2. (Also resulted in a corrupted PDF).

尽管做出了这些努力,当我尝试将 PDF 转换为 PDF/UA 时,我仍然看不到表示边界框的结构中的任何内容。如果我在 PDFBox 3.2 中正确定义图像的边界框时可能缺少任何指导,我们将不胜感激!这是图像创建和标记的代码:

COSDictionary markedContentDictionary3 = new COSDictionary();
        markedContentDictionary3.setInt(COSName.MCID, mcidCounter + 2);
        markedContentDictionary3.setString(COSName.ALT, "Alternate Image Description");

        PDMarkedContentReference mcr3 = new PDMarkedContentReference();
        mcr3.setMCID(mcidCounter + 2);

        //COSDictionary markedContentDictionary4 = new COSDictionary();
        //markedContentDictionary4.setInt(COSName.MCID, mcidCounter + 3);
        //PDMarkedContentReference mcr4 = new PDMarkedContentReference();
        //mcr4.setMCID(mcidCounter + 3);

        contentStream.beginMarkedContent(COSName.IMAGE, PDPropertyList.create(markedContentDictionary3));
        contentStream.drawImage(image, x, y, width, height);
        contentStream.endMarkedContent();
        // Schließen des Inhaltsstroms
        contentStream.close();

        PDStructureElement figureElement = new PDStructureElement(StandardStructureTypes.Figure, documentElement);
        figureElement.setPage(page);
        figureElement.setAlternateDescription("Dieses Bild zeigt: <dein_Tag_oder_Beschriftung>");

        figureElement.appendKid(mcr3);

        documentElement.appendKid(figureElement);
java pdf pdfbox
1个回答
0
投票

为图像指定编号:

image.setStructParent(structParentCounter + 1);

将figure元素包含在父树中,并为其分配一个属性

PDStructureElement figureElement = new PDStructureElement(StandardStructureTypes.Figure, documentElement);
PDLayoutAttributeObject attributeObject = new PDLayoutAttributeObject();
attributeObject.setBBox(new PDRectangle(x, y, width, height));
figureElement.addAttribute(attributeObject);
figureElement.setPage(page);
figureElement.setAlternateDescription("Dieses Bild zeigt: <dein_Tag_oder_Beschriftung>");
PDMarkedContentReference mcr3 = new PDMarkedContentReference();
mcr3.setMCID(mcidCounter + 2);
figureElement.appendKid(mcr3);
documentElement.appendKid(figureElement);
parentTreeMap.put(structParentCounter + 1, figureElement);
// add to the array from SO 79126664, the 0-based index = MCID
ar.add(null); // because you have an MCID "1" about which I know nothing about
ar.add(figureElement); 

也别忘了打电话

structureTreeRoot.setParentTreeNextKey()

最高值加1。

PAC 现在允许它通过,但是有一个警告“可能不恰当地使用图形结构元素”,但这更多是关于您必须改进的嵌套。

当我获得正确的文件进行比较时,我会更新此答案。

© www.soinside.com 2019 - 2024. All rights reserved.