从 PDF 的每一页中删除“预览”水印

问题描述 投票:0回答:1

在此处输入图像描述我正在尝试创建一个Python脚本,它将迭代PDF的每一页并删除水印。有些 PDF 文件有 500 多页,因此在发送给我们的客户之前需要手动删除所有页面的水印。我遇到的一个问题是,在某些页面上,水印是文本框对象,而其他页面是图像对象。没办法,这就是系统打印这些预览文件的方式。

我尝试使用 PyMuPDF 编写一个脚本,该脚本获取水印的像素坐标并删除具有这些精确尺寸的项目。然而,它有点有效,并非所有水印都是相同的(图像与文本),因此尺寸不同。另外,我只想删除水印,而不删除下面的任何内容。如果有人知道我如何继续前进,我将非常感激!

python pdf adobe acrobat
1个回答
0
投票

我尝试了以下代码,该代码是根据 pymupdf github 讨论查找并删除 PDF 文件中的水印中的代码进行了一些修改。它在当前的 pymupdf 版本上工作正常。

pip install PyMuPDF

import pymupdf

def process_page(page : pymupdf.Page):
    """Process one page."""
    # doc = page.parent  # the page's owning document
    # page.clean_contents()  # clean page painting syntax
    xref = page.get_contents()[0]  # get xref of resulting /Contents
    changed = 0  # this will be returned
    # read sanitized contents, splitted by line breaks
    cont_lines = page.read_contents().splitlines()
    print(len(cont_lines))
    # print(cont_lines)
    for i in range(len(cont_lines)):  # iterate over the lines
        line = cont_lines[i]
        # print(line)
        if not (line.startswith(b"/Artifact") and b"/Watermark" in line):
            continue  # this was not for us
        # line number i starts the definition, j ends it:
        print(line)
        j = cont_lines.index(b"EMC", i)
        for k in range(i, j):
            # look for image / xobject invocations in this line range
            do_line = cont_lines[k]
            if do_line.endswith(b"Do"):  # this invokes an image / xobject
                cont_lines[k] = b""  # remove / empty this line
                changed += 1
    if changed > 0:  # if we did anything, write back modified /Contents
        doc.update_stream(xref, b"\n".join(cont_lines))
    return changed

fpath = 'your_pdf_file_path/file_name.pdf'
doc = pymupdf.open(fpath)
changed = 0  # indicates successful removals
for page in doc:
    changed += process_page(page)  # increase number of changes
if changed > 0:
    x = "s" if doc.page_count > 1 else ""
    print(f"{changed} watermarks have been removed on {doc.page_count} page{x}.")
    doc.ez_save(doc.name.replace(".pdf", "-nowm.pdf"))
else:
    print("Nothing to change")

© www.soinside.com 2019 - 2024. All rights reserved.