我目前正在生成一个符合 PDF/UA 标准的 PDF 文件。我的主要目标是确保它符合辅助功能标准并通过 PAC(PDF 辅助功能检查器)工具。
我面临的问题是 PAC 检查器始终用错误标记我的 PDF:“测试对象未标记”。这表明必要的标记可能丢失或未正确实施,但我不确定我缺少什么来解决这个问题。
我已经确定:
Define document structure elements
Use the right tags in the PDF content generation code
Apply what I believe to be the correct metadata and settings for accessibility
但是,我似乎仍然缺少一些基本的标签被识别的东西。为了在 PDF 中实现正确的标记以实现 PDF/UA 合规性,我可能会忽略哪些步骤?
任何关于我可能出错的地方的见解,或者关于要检查的关键标记元素的指示,将不胜感激
。下面是我的代码。
public void newPdf() {
int mcidCounter = 0; // Starten bei 0
int structParentCounter = 0; // Starten bei 0
PDDocument document = new PDDocument();
PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);
// Setzen des StructParents-Eintrags auf der Seite
page.setStructParents(structParentCounter); // structParentCounter ist 0
PDPageContentStream contentStream = new PDPageContentStream(document, page);
PDType0Font font = loadFont(FontEnum.BUNDES_SANS_WEB_REGULAR, document);
// Schriftart den Ressourcen der Seite hinzufügen
PDResources resources = page.getResources();
if (resources == null) {
resources = new PDResources();
page.setResources(resources);
}
resources.add(font);
PDDocumentCatalog catalog = document.getDocumentCatalog();
PDStructureTreeRoot structureTreeRoot = new PDStructureTreeRoot();
catalog.setStructureTreeRoot(structureTreeRoot);
catalog.setLanguage("de-DE"); // Setzt die Dokumentensprache auf Deutsch
PDMarkInfo markInfo = new PDMarkInfo();
markInfo.setMarked(true);
catalog.setMarkInfo(markInfo);
// Erstellen des Dokument-Strukturelements
PDStructureElement documentElement = new PDStructureElement(StandardStructureTypes.DOCUMENT, structureTreeRoot);
structureTreeRoot.appendKid(documentElement);
// Erstellen des Absatz-Strukturelements
PDStructureElement paragraphElement = new PDStructureElement(StandardStructureTypes.P, documentElement);
paragraphElement.setPage(page);
documentElement.appendKid(paragraphElement);
// Vorbereiten des Markierungsinhalts mit MCID
COSDictionary markedContentDictionary = new COSDictionary();
markedContentDictionary.setInt(COSName.MCID, mcidCounter);
// Beginnen des markierten Inhalts
contentStream.beginMarkedContent(COSName.P, PDPropertyList.create(markedContentDictionary));
contentStream.setFont(font, 12);
contentStream.beginText();
contentStream.newLineAtOffset(50, 700);
contentStream.showText("Hallo Welt");
contentStream.endText();
contentStream.endMarkedContent();
// Schließen des Inhaltsstroms
contentStream.close();
// Erstellen des Parent Trees und Verknüpfen mit dem Strukturelement
COSDictionary parentTreeRoot = new COSDictionary();
PDNumberTreeNode parentTree = new PDNumberTreeNode(parentTreeRoot, COSBase.class);
// Mapping von StructParent zu Strukturelement
Map<Integer, COSObjectable> parentTreeMap = new HashMap<>();
parentTreeMap.put(structParentCounter, paragraphElement);
parentTree.setNumbers(parentTreeMap);
structureTreeRoot.setParentTree(parentTree);
// Dokument speichern
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
document.save(outputStream);
byte[] pdfBytes = outputStream.toByteArray();
document.close();
Path actualPath = Path.of("src/test/resources/pdf/small_test.pdf");
Files.write(actualPath, pdfBytes, StandardOpenOption.CREATE, StandardOpenOption.WRITE);
}
以下是我为使其与 PAC 兼容而所做的更改:
1)
PDViewerPreferences prefs = new PDViewerPreferences(new COSDictionary());
prefs.setDisplayDocTitle(true);
catalog.setViewerPreferences(prefs);
PDMarkedContentReference mcr = new PDMarkedContentReference();
mcr.setMCID(0);
paragraphElement.appendKid(mcr);
// alternative:
//paragraphElement.appendKid(PDMarkedContent.create(null, mcr.getCOSObject()));
(这是最难的问题)更换
parentTreeMap.put(structParentCounter, paragraphElement);
与
COSArray ar = new COSArray();
ar.add(paragraphElement);
parentTreeMap.put(structParentCounter, ar); // must be array here, despite only 1 element
structureTreeRoot.setParentTreeNextKey(1);
现已通过 PAC 测试。但是,它无法在 PDF-XChange 中正确显示,目前尚不清楚这是否是 PDF-XChange 的错误。这是因为 PDFBox 没有直接在内容流中具有 MCID,它引用资源。我将在 PDFBox 中进行更改并更新此答案。