我正在尝试通过 pdf2image 和 poppler 将一些 PDF 转换为图像,然后运行一些计算机视觉任务。
转换本身运行良好。
但是,转换会在转换时为 pdf 中的每个页面创建一些工件,我希望在函数结束时将其删除。为了促进这一点,我使用 tempfile.TemporaryDirectory()。该函数如下所示:
with tempfile.TemporaryDirectory() as path:
images_from_path: [Image] = convert_from_path(
os.path.join(path_superfolder, "calibration_target.pdf"),
size=(2480, 3508),
output_folder=path, poppler_path=r'E:\poppler-22.04.0\Library\bin')
if len(images_from_path) >= page:
images_from_path[page - 1].save(os.path.join(path_superfolder, "result.jpg"))
问题是,程序总是因以下错误而崩溃,在转换 PDF 并将所需图像写入文件后。
Traceback (most recent call last):
File "C:\Python310\lib\shutil.py", line 617, in _rmtree_unsafe
os.unlink(fullname)
PermissionError: [WinError 32] The process cannot access the file, because it is being used by another process: 'C:\\Users\\tobia\\AppData\\Local\\Temp\\tmp24c4bmzv\\bd76d834-672e-49fc-ac30-7751b7b660d0-01.ppm'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python310\lib\tempfile.py", line 843, in onerror
_os.unlink(path)
PermissionError: [WinError 32] The process cannot access the file, because it is being used by another process: 'C:\\Users\\tobia\\AppData\\Local\\Temp\\tmp24c4bmzv\\bd76d834-672e-49fc-ac30-7751b7b660d0-01.ppm'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python310\lib\code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "E:\PyCharm 2022.2.3\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "E:\PyCharm 2022.2.3\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "D:\Dokumente\Uni\Informatik\BA_Thesis\tumexam-scheduling-codebase\generate_data.py", line 393, in <module>
extract_calibration_page_as_image_from_pdf()
File "D:\Dokumente\Uni\Informatik\BA_Thesis\tumexam-scheduling-codebase\generate_data.py", line 190, in extract_calibration_page_as_image_from_pdf
tmp_dir.cleanup()
File "C:\Python310\lib\tempfile.py", line 873, in cleanup
self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
File "C:\Python310\lib\tempfile.py", line 855, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "C:\Python310\lib\shutil.py", line 749, in rmtree
return _rmtree_unsafe(path, onerror)
File "C:\Python310\lib\shutil.py", line 619, in _rmtree_unsafe
onerror(os.unlink, fullname, sys.exc_info())
File "C:\Python310\lib\tempfile.py", line 846, in onerror
cls._rmtree(path, ignore_errors=ignore_errors)
File "C:\Python310\lib\tempfile.py", line 855, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "C:\Python310\lib\shutil.py", line 749, in rmtree
return _rmtree_unsafe(path, onerror)
File "C:\Python310\lib\shutil.py", line 600, in _rmtree_unsafe
onerror(os.scandir, path, sys.exc_info())
File "C:\Python310\lib\shutil.py", line 597, in _rmtree_unsafe
with os.scandir(path) as scandir_it:
NotADirectoryError: [WinError 267] Directory name invalid: 'C:\\Users\\tobia\\AppData\\Local\\Temp\\tmp24c4bmzv\\bd76d834-672e-49fc-ac30-7751b7b660d0-01.ppm'
当单步执行清理例程时,一切看起来都很好,路径是正确的,它开始删除文件,直到某个时候内部路径变量变得混乱并且例程崩溃,因为显然文件不是目录。 对我来说,似乎竞争条件在这里引起了问题。
tmp_dir.cleanup()
显式调用例程在进行更多实验并写下这个问题时,我找到了一个可行的解决方案:
with tempfile.TemporaryDirectory() as path:
images_from_path: [Image] = convert_from_path(
os.path.join(path_superfolder, f"calibration_target_{exam_type}.pdf"),
size=(2480, 3508),
output_folder=path, poppler_path=r'E:\poppler-22.04.0\Library\bin')
if len(images_from_path) >= page:
images_from_path[page - 1].save(os.path.join(path_superfolder, "result.jpg"))
images_from_path = []
似乎不知何故,例程在清理方面遇到了麻烦,因为转换后的图像实际上是由
pdf2image
创建的工件,并且仍然由我的数据结构保存。在隐式启动清理之前重置数据结构解决了问题。
如果有更好的方法来解决这个问题,请随时告诉我。