我构建了一个简单的 CNN 模型,它引发了以下错误:
Epoch 1/10
235/235 [==============================] - ETA: 0s - loss: 540.2643 - accuracy: 0.4358
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-14-ab88232c98aa> in <module>()
15 train_ds,
16 validation_data=val_ds,
---> 17 epochs=epochs
18 )
7 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
58 ctx.ensure_initialized()
59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
62 if name is not None:
InvalidArgumentError: Unknown image file format. One of JPEG, PNG, GIF, BMP required.
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]] [Op:__inference_test_function_2924]
Function call stack:
test_function
我写的代码非常简单和标准。大多数都是直接从官方网站复制过来的。它在第一个纪元完成之前引发了此错误。我很确定这些图像都是 png 文件。 train 文件夹不包含任何文本、代码(图像除外)。我正在使用 Colab。
tensorlfow
的版本是2.5.0。感谢您的帮助。
data_dir = './train'
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
subset='training',
validation_split=0.2,
batch_size=batch_size,
seed=42
)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
subset='validation',
validation_split=0.2,
batch_size=batch_size,
seed=42
)
model = Sequential([
layers.InputLayer(input_shape=(image_size, image_size, 3)),
layers.Conv2D(32, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(
optimizer=optimizer,
loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
验证文件夹中的某些文件不符合 Tensorflow (
JPEG, PNG, GIF, BMP
) 接受的格式,或者可能已损坏。文件的扩展名仅供参考,不会对文件的内容施加任何影响。
imghdr
模块和一个简单的循环找到罪魁祸首。
from pathlib import Path
import imghdr
data_dir = "/home/user/datasets/samples/"
image_extensions = [".png", ".jpg"] # add there all your images file extensions
img_type_accepted_by_tf = ["bmp", "gif", "jpeg", "png"]
for filepath in Path(data_dir).rglob("*"):
if filepath.suffix.lower() in image_extensions:
img_type = imghdr.what(filepath)
if img_type is None:
print(f"{filepath} is not an image")
elif img_type not in img_type_accepted_by_tf:
print(f"{filepath} is a {img_type}, not accepted by TensorFlow")
这应该打印出您是否有不是图像的文件,或者不是其扩展名所说的文件,并且不被 TF 接受。然后您可以删除它们或将它们转换为 TensorFlow 支持的格式。
TensorFlow 在处理图像格式时有一定的严格性。这应该可以指导删除不良图像。有时,您的数据集甚至可能在 Torch 等上运行良好,但在 Tf 上会生成格式错误。尽管如此,最佳实践是始终对图像进行预处理,以确保模型稳健、安全和标准。
from pathlib import Path
import imghdr
from pathlib import Path
import imghdr
img_link=list(Path("/home/user/datasets/samples/").glob(r'**/*.jpg'))
count_num=0
for lnk in img_link:
binary_img=open(lnk,'rb')
find_img=tf.compat.as_bytes('JFIF') in binary_img.peek(10)#The JFIF is a JPEG File Interchange Format (JFIF). It is a standard which we gauge if an image is corrupt or substandard
if not find_img:
count_num+=1
os.remove(str(lnk))
print('Total %d pcs image delete from Dataset' % count_num)
#this should help you delete the bad encoded
这应该可以正常工作,对于支持的类型也是如此......例如 png :
image = tf.io.read_file("im.png")
image = tf.image.decode_png(image, channels=3)
imghdr
内置 Python 模块来猜测图像格式并断言它没有损坏并且与文件扩展名匹配。
但是,从 Python 3.11 开始
imghdr
已被弃用 (PEP 594),并将在 Python 3.13 中删除,因为其支持的格式数量有限且功能有限。
filetype
、puremagic
和 python-magic
。
这是使用
filetype
的示例:
from pathlib import Path
import filetype
# RFC image file extensions supported by TensorFlow
img_exts = {"png", "jpg", "gif", "bmp"}
path = Path("train")
for file in path.iterdir():
if file.is_dir():
continue
ext = filetype.guess_extension(file)
if ext is None:
print(f"'{file}': extension cannot be guessed from content")
elif ext not in img_exts:
print(f"'{file}': not a supported image file")
我也有同样的问题。我浏览了上面的很多答案,但没有一个对我有用。因此,我在 try except 块内编写了训练循环,并且将跳过存在这些问题的批次。请注意:这不是直接的解决方案。
iterator = iter(preprocessed_train_dataset)
max_iterations = len(preprocessed_train_dataset)
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
# Iterate over the batches of the dataset.
i = 0
while i < max_iterations:
print("Currently running {} batch".format(i))
try:
i = i + 1
x_batch_train, y_batch_train = next(iterator)
with tf.GradientTape() as tape:
logits = model(x_batch_train, training=True)
loss_value = loss_fn(y_batch_train, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
# Log every 200 batches.
if i % 200 == 0:
print(
"Training loss (for one batch) at step %d: %.4f"
% (i, float(loss_value))
)
print("Seen so far: %s samples" % ((i + 1) * batch_size))
train_acc = train_acc_metric.result()
print("Training acc over epoch: %.4f" % (float(train_acc),))
# Reset training metrics at the end of each epoch
train_acc_metric.reset_states()
for x_batch_val, y_batch_val in preprocessed_val_dataset:
val_logits = model(x_batch_val, training=False)
# Update val metrics
val_acc_metric.update_state(y_batch_val, val_logits)
val_acc = val_acc_metric.result()
val_acc_metric.reset_states()
print("Validation acc: %.4f" % (float(val_acc),))
except Exception as e:
continue
# Evaluate the model
test_loss, test_accuracy = model.evaluate(preprocessed_test_dataset)
莱斯克鲁尔的回答救了我的命!我做了一个小修改,以防您自动想要删除不可用的图像:
from pathlib import Path
import imghdr
import os
# Define the directory containing the images
# List of valid image extensions
image_extensions = [".png", ".jpg", ".jpeg", ".bmp", ".gif"]
# Image types accepted by TensorFlow
img_type_accepted_by_tf = ["bmp", "gif", "jpeg", "png"]
# Loop through all files in the directory and subdirectories
for filepath in Path(extract_path).rglob("*"):
# Check if it's a file before proceeding
if filepath.is_file():
# Check if the file has a valid image extension
if filepath.suffix.lower() in image_extensions:
# Check the actual image type
img_type = imghdr.what(filepath)
if img_type is None:
print(f"{filepath} is not an image. Deleting...")
os.remove(filepath) # Delete the file
elif img_type not in img_type_accepted_by_tf:
print(f"{filepath} is a {img_type}, not accepted by TensorFlow. Deleting...")
os.remove(filepath) # Delete the file
else:
# If the file does not have a valid extension
print(f"{filepath} is not a recognized image type. Deleting...")
os.remove(filepath) # Delete the file