如何在 Selenium 中判断文件何时被下载?

问题描述 投票:0回答:1

我有一个正在 Python Docker 映像中编写的机器人,用于访问网页并从帖子下载附件。下载不是来自链接,而是通过单击元素下载。默认下载位置设置为“tmp/downloads”。

def get_attachments(driver):
    try:
        if driver.find_element(By.ID, "display-attachments-list"):
            attachments_list_element = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "display-attachments-list")))
            attachments_list_items = attachments_list_element.find_elements(By.TAG_NAME, "li")
            if len(attachments_list_items) > 0:
                os.makedirs("/tmp/downloads", 511, True)
                for atch in attachments_list_items:
                    num_files = len(os.scandir("/tmp/downloads"))
                    scroll_to_element(driver, atch)
                    atch.click()
                    logger.info("Downloading attachment")
                    logger.info(atch.text)
                    for i in range(10):
                        if len(os.scandir("/tmp/downloads")) <= num_files:
                            time.sleep(1)
                        else:
                            break
                
            else:
                logger.info("No attachments found")
    except (NoSuchElementException, StaleElementReferenceException, ElementClickInterceptedException) as e:
        logger.info("Post does not include attachments")
    scroll_to_top(driver)

Scandir 并不像我想象的那样返回对象列表。我可能可以使用它,但我知道必须有更好的方法来做到这一点。有什么想法吗?

python selenium-webdriver
1个回答
0
投票

解决方案是将

os.listdir
len()
函数一起使用。如果您使用的是 Chrome,还可以检查是否有
.crdownload
文件。

这是一个示例脚本,用于等待 /tmp/downloads 目录中的文件数量超过原始数量,并且不存在

.crdownload
文件(Chrome 生成的占位符文件)。如果您不使用 Chrome,此脚本应该仍然可以工作。

def waitForDownload
    originalFiles = os.listdir("/tmp/downloads/")
    done = False
    while done == False:
        time.sleep(1)
        done = True
    
        files = os.listdir("/tmp/downloads")
        if len(files) == len(originalFiles)
            done = False
        for filename in files:
            if filename.endswith('.crdownload'):
                done = False
    return

您可以在代码中使用

os.listdir
,如下所示:

def get_attachments(driver):
    try:
        if driver.find_element(By.ID, "display-attachments-list"):
            attachments_list_element = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "display-attachments-list")))
            attachments_list_items = attachments_list_element.find_elements(By.TAG_NAME, "li")
            if len(attachments_list_items) > 0:
                os.makedirs("/tmp/downloads", 511, True)
                for atch in attachments_list_items:
                    num_files = len(os.listdir("/tmp/downloads"))
                    scroll_to_element(driver, atch)
                    atch.click()
                    logger.info("Downloading attachment")
                    logger.info(atch.text)
                    for i in range(15):
                        if len(os.listdir("/tmp/downloads")) <= num_files:
                            time.sleep(1)
                        else:
                            break
                
            else:
                logger.info("No attachments found")
    except (NoSuchElementException, StaleElementReferenceException, ElementClickInterceptedException) as e:
        logger.info("Post does not include attachments")
    scroll_to_top(driver)
© www.soinside.com 2019 - 2024. All rights reserved.