使用 selenium 下载 PDF

问题描述 投票:0回答:1

我正在尝试使用 selenium 下载 PDF,但参数 driver.download_file(file_name, target_directory) 返回“WebDriverException:您必须启用下载才能使用可下载文件。

我尝试添加选项 chrome_options.enable_downloads = True,但它不起作用。我还尝试使用不同的浏览器(我在 Edge 上遇到了同样的问题,Firefox 返回了另一个错误)。我也尝试了几个旧版本的 Selenium,但没有成功。

最后,我只想下载 PDF 并将它们存储在特定的文件夹中。如果有人对我如何实现这一目标有任何建议,那将非常有帮助!

这是我的完整代码,如果我可以提供其他任何内容,请告诉我:)

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def download_pdf_and_rename(url, filename):
   # Configure Chrome options to download PDFs to a temporary directory
    chrome_options = Options()
    
    chrome_options.enable_downloads = True

    driver = webdriver.Chrome(options=chrome_options)

    # Access the PDF URL
    driver.get(url)

    time.sleep(5)  # Adjust the sleep time as needed
    
    driver.download_file('my_pdf.pdf', MY_PATH)
    
    # Close the browser
    driver.quit()


download_pdf_and_rename("https://pubs.aeaweb.org/doi/pdfplus/10.1257/aer.20170866", "my_pdf.pdf")

谢谢!

python selenium-webdriver web-scraping
1个回答
0
投票

Selenium 没有内置

enable_downloads
属性。相反,您需要设置特定的 Chrome 首选项来控制下载行为,包括保存文件的目录以及如何处理 PDF 文件。

import time
import os
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager

def download_pdf_and_rename(url, target_directory, filename):
    # Ensure the target directory exists
    if not os.path.exists(target_directory):
        os.makedirs(target_directory)
    chrome_options = Options()
    chrome_options.add_experimental_option("prefs", {
        "download.default_directory": target_directory,  
        "download.prompt_for_download": False,  
        "plugins.always_open_pdf_externally": True,  
    })

    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
    driver.get(url)
    time.sleep(10)
    downloaded_file_path = os.path.join(target_directory, "document.pdf")
    renamed_file_path = os.path.join(target_directory, filename)
    if os.path.exists(downloaded_file_path):
        os.rename(downloaded_file_path, renamed_file_path)
        print(f"File downloaded and renamed to: {renamed_file_path}")
    else:
        print("Downloaded file not found. Check the download settings or file name.")
    driver.quit()
download_pdf_and_rename(
    "https://pubs.aeaweb.org/doi/pdfplus/10.1257/aer.20170866",
    target_directory="./downloads",
    filename="my_pdf.pdf"
)
© www.soinside.com 2019 - 2024. All rights reserved.