使用selenium chromedriver从我的网络爬虫下载的文件输出到错误的目录

问题描述 投票:0回答:1

我正在尝试使用 PyInstaller 将我的

main.py
脚本打包成可执行文件。该脚本包含一个网络爬虫,它使用 Selenium 和
chromedriver.exe
导航到网站并自动将文件 (PDF) 下载到名为“Files”的特定目录中,该目录与
main.py
位于同一目录中。为了清楚起见,这是预期文件结构的屏幕截图。

当我直接运行

main.py
时,一切都按预期工作,下载到“文件”目录。但是,使用 PyInstaller 打包后使用以下命令:

pyinstaller --onefile --add-data "chromedriver.exe;." --add-data "urls.txt;." main.py

并运行生成的

.exe
文件(
chromedriver.exe
urls.txt
包含在同一目录中),我遇到一个问题:虽然
.exe
成功启动 Chrome 并下载文件,但它不再创建或使用“文件”目录位于同一位置。相反,下载的内容会保存到像
C:\Users\{username}\AppData\Local\Temp\_MEI78762\Files
这样的临时目录中,该目录在程序退出后会被删除,因此下载的文件将无法访问。

下面是我用来设置下载路径的代码。该逻辑尝试检测可执行文件的基本路径,但它没有按预期工作:

# Determine the base path
if getattr(sys, 'frozen', False):
    # If the application is run as a bundle, the PyInstaller bootloader
    # extends the sys module by a flag frozen=True and sets the app 
    # path into variable _MEIPASS'.
    base_path = sys._MEIPASS
else:
    base_path = os.path.abspath(".")

# Create the Files directory if it doesn't exist
download_dir = os.path.join(base_path, "Files")
if not os.path.exists(download_dir):
    os.makedirs(download_dir)
# Extract all URLS from urls.txt and store in a variable call urls
urls = []
with open("./test_urls.txt", "r") as file:
    urls = file.readlines()

# Configure Chrome options to set the download directory and disable the download prompt
chrome_options = webdriver.ChromeOptions()
prefs = {
    "download.default_directory": download_dir,
    "download.prompt_for_download": False,
    "directory_upgrade": True,
    "safebrowsing.enabled": True,
    "safebrowsing.disable_download_protection": True,  # Disable download protection
    "profile.default_content_setting_values.automatic_downloads": 1,  # Allow automatic downloads
    "profile.default_content_settings.popups": 0,  # Disable popups
    "profile.content_settings.exceptions.automatic_downloads.*.setting": 1  # Allow multiple downloads
}
python selenium-webdriver pyinstaller
1个回答
0
投票

当您将脚本打包成独立的可执行文件时,默认情况下可执行文件会将文件解压到临时目录(如

_MEIPASS
)。要解决此问题,您需要修改
base_path
以指向可执行文件所在的目录。当 sys.frozen 为 true 时,我们可以使用 sys.executable 来执行此操作。

这个实现看起来像这样:

import os
import sys
from selenium import webdriver

# Determine the base path
if getattr(sys, 'frozen', False):
    # Running as a PyInstaller bundle, use the directory of the executable
    base_path = os.path.dirname(sys.executable)
else:
    # Running as a script, use the current working directory
    base_path = os.path.abspath(".")

# Define the download directory for "Files" within base_path
download_dir = os.path.join(base_path, "Files")

# Create the "Files" directory if it doesn't exist
if not os.path.exists(download_dir):
    os.makedirs(download_dir)

# Define the path to urls.txt and check if it exists
urls_file = os.path.join(base_path, "urls.txt")
if not os.path.isfile(urls_file):
    raise FileNotFoundError(f"Expected 'urls.txt' in {base_path}. Please place 'urls.txt' in the same directory as the executable.")

# Read URLs from urls.txt
urls = []
with open(urls_file, "r") as file:
    urls = file.readlines()

# Configure Chrome options for Selenium
chrome_options = webdriver.ChromeOptions()
prefs = {
    "download.default_directory": download_dir,
    "download.prompt_for_download": False,
    "directory_upgrade": True,
    "safebrowsing.enabled": True,
    "safebrowsing.disable_download_protection": True,
    "profile.default_content_setting_values.automatic_downloads": 1,
    "profile.default_content_settings.popups": 0,
    "profile.content_settings.exceptions.automatic_downloads.*.setting": 1
}
chrome_options.add_experimental_option("prefs", prefs)

# Initialize the Chrome WebDriver
driver = webdriver.Chrome(executable_path=os.path.join(base_path, "chromedriver.exe"), options=chrome_options)
© www.soinside.com 2019 - 2024. All rights reserved.