让我用要点来澄清我的问题。
PyTorch
(或无论出于什么原因),我正在上一个Dataset
课程p = Path(self.root_dir) / "Training" if self.is_train else "Validation"
image_p = p / "01.원천데이터" / f"{"T" if self.is_train else "V"}S_images"
label_p = p / "02.라벨링데이터" / f"{"T" if self.is_train else "V"}L_labels"
# set the return lists
image_path_list = []
label_list = []
# get image paths
for sentence_dir in image_p.glob("*"): # only have several subfolders
for true_false_dir in sentence_dir.glob("*"): # only have several subfolders too
for posture_dir in true_false_dir.glob("*"): # only have several subfolders again
image_path = sorted(list(posture_dir.glob("*")))[-1] # in 'posture_dir', there are images, but I need only the last one
image_path_list.append(str(image_path))
有什么办法可以让执行速度更快吗?
有关
multiprocessing
或 multithreading
的大多数资源似乎都有一个通过传递列表类型参数对函数进行向量化的概念,但不确定这是否适合我现在的情况...
image_path_list = list(image_p.glob("**/*"))
全局模式
**
匹配任何深度。这非常快,并且不需要并行化。