有没有办法通过python解析PDF内部的自包含链接？

Question

我有一个 pdf 文档，其中包含一些超链接，这些超链接可通往同一文档的不同小节。我想解析这些链接以创建类似于该文档的知识图的内容。我有什么方法可以阅读/解析这些链接吗？

我尝试过使用传统的 PDF 阅读器，如 PyPDF2 或 PdfPlumber，但由于它们作为 OCR 工作，因此无法获取这些超链接。

Answer 1

从那里获取 PDF：https://hugepdf.com/download/user-manual-43_pdf，它有外部和内部喜欢，您可以查找所有子类型

/Links

：

from PyPDF2 import PdfReader
from pprint import pprint

# pdf: https://hugepdf.com/download/user-manual-43_pdf
with open("/tmp/pdf-user-manual.pdf", 'rb') as f:
    pdf = PdfReader(f)

    for i, page in enumerate(pdf.pages):
        if "/Annots" in page:
            for annot in page["/Annots"]:
                subtype = annot.get_object()["/Subtype"]
                if subtype == "/Link":
                    print(annot.get_object())

出：

{'/BS': {'/W': 0},
 '/Dest': [IndirectObject(24, 0, 4373945936), '/XYZ', 87, 769, 0],
 '/F': 4,
 '/Rect': [278.16, 442.32, 333.81, 457.92],
 '/StructParent': 1,
 '/Subtype': '/Link'}
{'/A': {'/S': '/URI',
        '/Type': '/Action',
        '/URI': 'http://www.microsoft.com/visualstudio/eng'},
 '/BS': {'/W': 0},
 '/F': 4,
 '/Rect': [476.68, 395.52, 509.92, 411.12],
 '/StructParent': 2,
 '/Subtype': '/Link'}
{'/A': {'/S': '/URI',
        '/Type': '/Action',
        '/URI': 'http://www.microsoft.com/visualstudio/eng'},
 '/BS': {'/W': 0},
 '/F': 4,
 '/Rect': [87.75, 379.92, 162.35, 395.52],
 '/StructParent': 3,
 '/Subtype': '/Link'}

有没有办法通过python解析PDF内部的自包含链接？

问题描述投票：0回答：1

1个回答

最新问题

有没有办法通过python解析PDF内部的自包含链接？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1