我有一个正在 Python Docker 映像中编写的机器人,用于访问网页并从帖子下载附件。下载不是来自链接,而是通过单击元素下载。默认下载位置设置为“tmp/downloads”。
def get_attachments(driver):
try:
if driver.find_element(By.ID, "display-attachments-list"):
attachments_list_element = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "display-attachments-list")))
attachments_list_items = attachments_list_element.find_elements(By.TAG_NAME, "li")
if len(attachments_list_items) > 0:
os.makedirs("/tmp/downloads", 511, True)
for atch in attachments_list_items:
num_files = len(os.scandir("/tmp/downloads"))
scroll_to_element(driver, atch)
atch.click()
logger.info("Downloading attachment")
logger.info(atch.text)
for i in range(10):
if len(os.scandir("/tmp/downloads")) <= num_files:
time.sleep(1)
else:
break
else:
logger.info("No attachments found")
except (NoSuchElementException, StaleElementReferenceException, ElementClickInterceptedException) as e:
logger.info("Post does not include attachments")
scroll_to_top(driver)
Scandir 并不像我想象的那样返回对象列表。我可能可以使用它,但我知道必须有更好的方法来做到这一点。有什么想法吗?
解决方案是将
os.listdir
与 len()
函数一起使用。如果您使用的是 Chrome,还可以检查是否有 .crdownload
文件。
这是一个示例脚本,用于等待 /tmp/downloads 目录中的文件数量超过原始数量,并且不存在
.crdownload
文件(Chrome 生成的占位符文件)。如果您不使用 Chrome,此脚本应该仍然可以工作。
def waitForDownload
originalFiles = os.listdir("/tmp/downloads/")
done = False
while done == False:
time.sleep(1)
done = True
files = os.listdir("/tmp/downloads")
if len(files) == len(originalFiles)
done = False
for filename in files:
if filename.endswith('.crdownload'):
done = False
return
您可以在代码中使用
os.listdir
,如下所示:
def get_attachments(driver):
try:
if driver.find_element(By.ID, "display-attachments-list"):
attachments_list_element = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "display-attachments-list")))
attachments_list_items = attachments_list_element.find_elements(By.TAG_NAME, "li")
if len(attachments_list_items) > 0:
os.makedirs("/tmp/downloads", 511, True)
for atch in attachments_list_items:
num_files = len(os.listdir("/tmp/downloads"))
scroll_to_element(driver, atch)
atch.click()
logger.info("Downloading attachment")
logger.info(atch.text)
for i in range(15):
if len(os.listdir("/tmp/downloads")) <= num_files:
time.sleep(1)
else:
break
else:
logger.info("No attachments found")
except (NoSuchElementException, StaleElementReferenceException, ElementClickInterceptedException) as e:
logger.info("Post does not include attachments")
scroll_to_top(driver)