我有几个具有不同名称的 .jpeg 图像,我想将它们加载到 jupyter 笔记本中的 cnn 中以对它们进行分类。我发现的唯一方法是:
test_image = image.load_img("name_of_picute.jpeg",target_size=(64,64))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis=0)
result = cnn.predict(test_image)
在 Keras API 中找到的所有其他内容(如
tf.keras.preprocessing.image_dataset_from_directory()
)似乎只适用于标记数据。遗憾的是,我无法“简单地”迭代图片的名称,因为它们的命名不同,有没有办法一次性预测所有图片而不命名每张图片?
感谢您的帮助,
尼克
解决方案
tf.keras.preprocessing.image_dataset_from_directory
可以更新为返回数据集和image_path,如此处所述 -> https://stackoverflow.com/a/63725072/4994352
有多种方法,对于较大的数据,使用
tf.data.DataSet
很有用,因为它可以很容易地调整性能。我会给你非性能优化的代码。将 <YOUR PATH INCL. REGEX>
替换为 ../input/pokemon-images-and-types/images/*/*
等路径。
import tensorflow as tf
from tensorflow.data.experimental import AUTOTUNE
def load(file_path):
img = tf.io.read_file(file_path)
img = tf.image.decode_jpeg(img, channels=3)
... # do some preprocessing like resizing if necessary
return img
list_ds = tf.data.Dataset.list_files(str('<YOUR PATH INCL. REGEX>'), shuffle=True) # Get all images from subfolders
train_dataset = list_ds.take(-1)
# Set `num_parallel_calls` so multiple images are loaded/processed in parallel.
train_dataset = train_dataset.map(load, num_parallel_calls=AUTOTUNE)
将图像文件存储在子目录中,如下所示:
train_dataset
|
|--class_0
|
|- <images>
现在,您可以使用 ImageDataGenerator 加载目录的图像(在我们的例子中,它来自“train_dataset/class_0”)
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
rescale=1./255, # Normalize the images
rotation_range=20, # Random rotations
width_shift_range=0.2, # Horizontal shifts
height_shift_range=0.2, # Vertical shifts
shear_range=0.2, # Shear transformations
zoom_range=0.2, # Random zoom
horizontal_flip=True, # Horizontal flips
fill_mode='nearest', # Filling strategy for new pixels
validation_split=0.2
)
train_generator = datagen.flow_from_directory(
image_directory,
target_size=(64, 64), # Adjust based on your needs
batch_size=8,
class_mode=None, # Unsupervised learning
shuffle=True,
subset='training' # Set as training data
)
validation_generator = datagen.flow_from_directory(
image_directory,
target_size=(64, 64), # Adjust based on your needs
batch_size=8,
class_mode=None, # Unsupervised learning
shuffle=True,
subset='validation' # Set as validation data
)
这里
image_director
:"train_dataset/" i.e parent directory Note that
class_model = None`,指定我们不需要标签
获取批次:
for batch in train_generator:
print(batch)
break