目前,我发现可以使用 Selenium 创建屏幕截图。但是,它们始终是
.png
文件。如何截取与.pdf
相同风格的屏幕截图?
要求样式:无边距;与当前页面相同的尺寸(如整页屏幕截图)
由于打印附带的所有格式,打印页面无法实现此目的。
我目前如何获取屏幕截图:
from selenium import webdriver
# Function to find page size
S = lambda X: driver.execute_script('return document.body.parentNode.scroll'+X)
driver = webdriver.Firefox(options=options)
driver.get('https://www.google.com')
# Screen
height = S('Height')
width = S('Width')
driver.set_window_size(width, height)
driver.get_screenshot_as_file(PNG_SAVEAS)
driver.close()
为了达到预期的结果,我找到了一个在其他地方不容易获得的解决方案。
关键是动态配置PDF页面的宽度和高度以匹配正在打印的内容。此外,我发现将结果缩小到原始大小的 1% 可以显着加快该过程。
需要注意的一点是,在使用 GeckoDriver 时,我遇到了一个错误(reference),导致生成的 PDF 打印尺寸错误。但是,我发现将大小乘以
2.5352112676056335
可以解决问题。我仍然不清楚为什么这个特定常数与我的答案相关,但如果不应用此修复,PDF 的纵横比就会扭曲(而不是按比例缩小到所需大小的 39%)。扭曲会产生多页 .pdf 文件,这不是预期的结果。
此方法已使用 GeckoDriver 进行测试。如果您使用的是 Chrome,则可能不需要
RATIO_MULTIPLIER
解决方法。
from selenium import webdriver
from selenium.webdriver.common.print_page_options import PrintOptions
import base64
# Bug in geckodriver... seems unrelated, but this wont work otherwise.
# https://github.com/SeleniumHQ/selenium/issues/12066
RATIO_MULTIPLIER = 2.5352112676056335
# Function to find page size
S = lambda X: driver.execute_script('return document.body.parentNode.scroll'+X)
# Scale for PDF size. 1 for no change takes long time
pdf_scaler = .01
# Browser options. Headless is more reliable for screenshots in my exp.
options = webdriver.FirefoxOptions()
options.add_argument('--headless')
# Lanuch webdriver, navigate to destination
driver = webdriver.Firefox(options=options)
driver.get('https://www.google.com')
# Find full page dimensions regardless of scroll
height = S('Height')
weight = S('Width')
# Dynamic setting of PDF page dimensions
print_options = PrintOptions()
print_options.page_height = (height*pdf_scaler)*RATIO_MULTIPLIER
print_options.page_width = (weight*pdf_scaler)*RATIO_MULTIPLIER
print_options.shrink_to_fit = True
# Prints to PDF (returns base64 encoded data. Must save)
pdf = driver.print_page(print_options=print_options)
driver.close()
# save the output to a file.
with open('example.pdf', 'wb') as file:
file.write(base64.b64decode(pdf))
使用的版本:
geckodriver 0.31.0
Firefox 113.0.1
selenium==4.9.1
Python 3.11.2
Windows 10
编辑:这是因为这里的单位是厘米,而不是英寸。 2.5352112676056335 是换算英寸->厘米:)
试试这个:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from webdriver_manager.firefox import GeckoDriverManager
from PIL import Image
def get_page_size(driver):
return driver.execute_script('return [document.documentElement.clientWidth, document.documentElement.clientHeight];')
def scroll_to_bottom(driver):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
def capture_screenshot_as_pdf(driver, file_path):
driver.save_screenshot(file_path)
def convert_to_pdf(input_file, output_file):
image = Image.open(input_file)
image.save(output_file, 'PDF', resolution=100.0)
# Set up the Firefox driver with options
options = Options()
options.headless = True
capabilities = DesiredCapabilities.FIREFOX.copy()
capabilities['acceptInsecureCerts'] = True
driver = webdriver.Firefox(options=options, executable_path=GeckoDriverManager().install(), capabilities=capabilities)
# Navigate to the webpage
driver.get('https://www.google.com')
# Get the page size
page_size = get_page_size(driver)
# Set the window size
driver.set_window_size(page_size[0], page_size[1])
# Scroll to the bottom to load dynamic content
scroll_to_bottom(driver)
# Capture the full-page screenshot as PNG
png_file_path = 'full_page_screenshot.png'
capture_screenshot_as_pdf(driver, png_file_path)
# Convert the PNG screenshot to PDF
pdf_file_path = 'full_page_screenshot.pdf'
convert_to_pdf(png_file_path, pdf_file_path)
# Clean up and close the browser
driver.quit()
此代码将整页屏幕截图捕获为 PNG 文件,然后将其转换为 PDF 文件。将文件路径(png_file_path 和 pdf_file_path)调整到您想要保存文件的位置。