我正在尝试使用 selenium 下载 PDF,但参数 driver.download_file(file_name, target_directory) 返回“WebDriverException:您必须启用下载才能使用可下载文件。”
我尝试添加选项 chrome_options.enable_downloads = True,但它不起作用。我还尝试使用不同的浏览器(我在 Edge 上遇到了同样的问题,Firefox 返回了另一个错误)。我也尝试了几个旧版本的 Selenium,但没有成功。
最后,我只想下载 PDF 并将它们存储在特定的文件夹中。如果有人对我如何实现这一目标有任何建议,那将非常有帮助!
这是我的完整代码,如果我可以提供其他任何内容,请告诉我:)
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def download_pdf_and_rename(url, filename):
# Configure Chrome options to download PDFs to a temporary directory
chrome_options = Options()
chrome_options.enable_downloads = True
driver = webdriver.Chrome(options=chrome_options)
# Access the PDF URL
driver.get(url)
time.sleep(5) # Adjust the sleep time as needed
driver.download_file('my_pdf.pdf', MY_PATH)
# Close the browser
driver.quit()
download_pdf_and_rename("https://pubs.aeaweb.org/doi/pdfplus/10.1257/aer.20170866", "my_pdf.pdf")
谢谢!
Selenium 没有内置
enable_downloads
属性。相反,您需要设置特定的 Chrome 首选项来控制下载行为,包括保存文件的目录以及如何处理 PDF 文件。
import time
import os
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
def download_pdf_and_rename(url, target_directory, filename):
# Ensure the target directory exists
if not os.path.exists(target_directory):
os.makedirs(target_directory)
chrome_options = Options()
chrome_options.add_experimental_option("prefs", {
"download.default_directory": target_directory,
"download.prompt_for_download": False,
"plugins.always_open_pdf_externally": True,
})
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
driver.get(url)
time.sleep(10)
downloaded_file_path = os.path.join(target_directory, "document.pdf")
renamed_file_path = os.path.join(target_directory, filename)
if os.path.exists(downloaded_file_path):
os.rename(downloaded_file_path, renamed_file_path)
print(f"File downloaded and renamed to: {renamed_file_path}")
else:
print("Downloaded file not found. Check the download settings or file name.")
driver.quit()
download_pdf_and_rename(
"https://pubs.aeaweb.org/doi/pdfplus/10.1257/aer.20170866",
target_directory="./downloads",
filename="my_pdf.pdf"
)