使用 Selenium 在 Chrome 中下载网页时与“另存为”窗口交互

Question

我有两个脚本：

```
other.py
```
看起来像这样：

# some stuff is done here and a list of urls is created such as:

urls = ['https://www.walmart.com/ip/Sabrina-Carpenter-Cherry-Pop-EDP-30ml-1oz/5492571361?classType=REGULAR&athbdg=L1600', 'https://www.walmart.com/ip/Hoey-5-1-Painless-Hair-Remover-Women-Facial-Removal-Electric-Cordless-Shaver-Set-Wet-Dry-Lady-Razor-Women-Bikini-Line-Nose-Hair-Eyebrow-Arm-Leg-USB-R/647670434?classType=REGULAR']

# Then, the script runs another script called get_url.py and passes the urls to it to be processed: 
subprocess.Popen(['python', 'get_url.py', str(urls)])

#it is important that this does not block the code and the rest of the code in this script can run without waiting for get_url.py to  complete.

```
get_url.py
```
看起来像这样，并下载传递给它的每个 url：

import pandas as pd
import os
import time
from datetime import datetime
import pyautogui
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
from concurrent.futures import ProcessPoolExecutor

def get_page(url):
    file_name = f"{url[:20]}_{pd.to_datetime(datetime.now()).strftime('%Y-%m-%d %H-%M-%S')}.html"
    file_path = os.path.join(os.getcwd(), 'data', 'htmls')
    path_and_name = os.path.join(file_path, file_name) 

    driver = webdriver.Chrome(options=options)
    driver.get(url)

    time.sleep(1)
    pyautogui.hotkey('ctrl', 's') # open the save as window
    time.sleep(1)
    pyautogui.typewrite(path_and_name ) # enter the path and file name so the webpage is downloaded in the desired directory
    time.sleep(.5)
    pyautogui.hotkey('enter')   
    time.sleep(.2)

    while True: # wait until the download is complete, then close the driver
        files = os.listdir(file_path)
        if file_name in files: 
            driver.close()
            break
        time.sleep(.1)

urls = sys.argv[1] # getting urls from other.py
#converting the string urls to an actual list:
urls = ast.literal_eval(page_urls.replace('[', '').replace(']', '').replace('\n', ', '))

if __name__ =='__main__': # multi-processing the urls to speed up things(necessary)
    with ProcessPoolExecutor(max_workers=10) as executer:
        executer.map(get_page, urls, chunksize = 1)

只要打开一个浏览器，该功能就可以正常工作。然而，一旦通过

ProcessPoolExecutor

打开多个窗口，函数的

pyautogui.typewrite

部分就会失去对窗口的跟踪，这可能会导致

path_and_name

在“另存为”窗口中被多次键入，或者输入不完整导致页面无法下载或使用错误的名称/目录下载。更糟糕的是，如果我在函数运行时单击代码编辑器中的某个位置，

pyautogui

可能会在光标处于活动状态的编辑器中键入

path_and_name

值。在“无头”模式下运行浏览器，这样我就不会意外地弄乱窗口，这并没有帮助。

那么，基本上，我该如何修复上面的代码？

Answer 1

因为 pyautogui 在活动窗口上运行。如果您想并行运行多个窗口或在程序运行时使用浏览器，您应该尝试使用不与 GUI 交互的东西。

因此，不要单击保存窗口，而是定义一个下载文件夹并将 Chrome 浏览器配置为下载到该文件夹。

# download folder
downloads = os.path.join(os.get(), html, data)

# Chrome options
options = Options()
prefs = {
  "download.default_directory": downloads,
}

options.add_experimental_option("prefs", prefs)

使用 Selenium 在 Chrome 中下载网页时与“另存为”窗口交互

问题描述投票：0回答：1

1个回答

最新问题

使用 Selenium 在 Chrome 中下载网页时与“另存为”窗口交互

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1