未知的图像文件格式。需要 JPEG、PNG、GIF、BMP 之一

问题描述 投票:0回答:6

我构建了一个简单的 CNN 模型,它引发了以下错误:

Epoch 1/10
235/235 [==============================] - ETA: 0s - loss: 540.2643 - accuracy: 0.4358
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-14-ab88232c98aa> in <module>()
     15     train_ds,
     16     validation_data=val_ds,
---> 17     epochs=epochs
     18 )

7 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

InvalidArgumentError:  Unknown image file format. One of JPEG, PNG, GIF, BMP required.
     [[{{node decode_image/DecodeImage}}]]
     [[IteratorGetNext]] [Op:__inference_test_function_2924]

Function call stack:
test_function

我写的代码非常简单和标准。大多数都是直接从官方网站复制过来的。它在第一个纪元完成之前引发了此错误。我很确定这些图像都是 png 文件。 train 文件夹不包含任何文本、代码(图像除外)。我正在使用 Colab。

tensorlfow
的版本是2.5.0。感谢您的帮助。

data_dir = './train'

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir, 
    subset='training',
    validation_split=0.2,
    batch_size=batch_size,
    seed=42
)

val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir, 
    subset='validation',
    validation_split=0.2,
    batch_size=batch_size,
    seed=42
)

model = Sequential([
    layers.InputLayer(input_shape=(image_size, image_size, 3)),
    layers.Conv2D(32, 3, activation='relu'),
    layers.MaxPooling2D(),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(num_classes)
    ])

optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(
    optimizer=optimizer,
    loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy'])

history = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=epochs
)
python tensorflow
6个回答
26
投票

验证文件夹中的某些文件不符合 Tensorflow (

JPEG, PNG, GIF, BMP
) 接受的格式,或者可能已损坏。文件的扩展名仅供参考,不会对文件的内容施加任何影响。

您也许可以使用Python标准库中的

imghdr
模块和一个简单的循环找到罪魁祸首。

from pathlib import Path
import imghdr

data_dir = "/home/user/datasets/samples/"
image_extensions = [".png", ".jpg"]  # add there all your images file extensions

img_type_accepted_by_tf = ["bmp", "gif", "jpeg", "png"]
for filepath in Path(data_dir).rglob("*"):
    if filepath.suffix.lower() in image_extensions:
        img_type = imghdr.what(filepath)
        if img_type is None:
            print(f"{filepath} is not an image")
        elif img_type not in img_type_accepted_by_tf:
            print(f"{filepath} is a {img_type}, not accepted by TensorFlow")

这应该打印出您是否有不是图像的文件,或者不是其扩展名所说的文件,并且不被 TF 接受。然后您可以删除它们或将它们转换为 TensorFlow 支持的格式。


3
投票

TensorFlow 在处理图像格式时有一定的严格性。这应该可以指导删除不良图像。有时,您的数据集甚至可能在 Torch 等上运行良好,但在 Tf 上会生成格式错误。尽管如此,最佳实践是始终对图像进行预处理,以确保模型稳健、安全和标准。

from pathlib import Path
import imghdr

from pathlib import Path
import imghdr

img_link=list(Path("/home/user/datasets/samples/").glob(r'**/*.jpg'))

count_num=0
for lnk in img_link:
    binary_img=open(lnk,'rb')
    find_img=tf.compat.as_bytes('JFIF') in binary_img.peek(10)#The JFIF is a JPEG File Interchange Format (JFIF). It is a standard which we gauge if an image is corrupt or substandard
    if not find_img:
        count_num+=1
        os.remove(str(lnk))
print('Total %d pcs image delete from Dataset' % count_num)
#this should help you delete the bad encoded

1
投票

这应该可以正常工作,对于支持的类型也是如此......例如 png :

image = tf.io.read_file("im.png")
image = tf.image.decode_png(image, channels=3)

1
投票

如其他答案中所述,您可以使用

imghdr
内置 Python 模块来猜测图像格式并断言它没有损坏并且与文件扩展名匹配。

但是,从 Python 3.11 开始

imghdr
已被弃用 (PEP 594),并将在 Python 3.13 中删除,因为其支持的格式数量有限且功能有限。

PEP 中列出了三种替代方案:

filetype
puremagic
python-magic

这是使用

filetype
的示例:

from pathlib import Path
import filetype


# RFC image file extensions supported by TensorFlow
img_exts = {"png", "jpg", "gif", "bmp"}

path = Path("train")

for file in path.iterdir():
    if file.is_dir():
        continue

    ext = filetype.guess_extension(file)

    if ext is None:
        print(f"'{file}': extension cannot be guessed from content")
    elif ext not in img_exts:
        print(f"'{file}': not a supported image file")

0
投票

我也有同样的问题。我浏览了上面的很多答案,但没有一个对我有用。因此,我在 try except 块内编写了训练循环,并且将跳过存在这些问题的批次。请注意:这不是直接的解决方案。

iterator = iter(preprocessed_train_dataset)
max_iterations = len(preprocessed_train_dataset)
for epoch in range(epochs):
    print("\nStart of epoch %d" % (epoch,))
    # Iterate over the batches of the dataset.
    i = 0
    while i < max_iterations:
        print("Currently running {} batch".format(i))
        try:
            i = i + 1
            x_batch_train, y_batch_train = next(iterator)
            with tf.GradientTape() as tape:
                logits = model(x_batch_train, training=True)
                loss_value = loss_fn(y_batch_train, logits)

            grads = tape.gradient(loss_value, model.trainable_weights)
            optimizer.apply_gradients(zip(grads, model.trainable_weights))

            # Log every 200 batches.
            if i % 200 == 0:
                print(
                    "Training loss (for one batch) at step %d: %.4f"
                    % (i, float(loss_value))
                )
                print("Seen so far: %s samples" % ((i + 1) * batch_size))

            train_acc = train_acc_metric.result()
            print("Training acc over epoch: %.4f" % (float(train_acc),))

            # Reset training metrics at the end of each epoch
            train_acc_metric.reset_states()
            for x_batch_val, y_batch_val in preprocessed_val_dataset:
                val_logits = model(x_batch_val, training=False)
                # Update val metrics
                val_acc_metric.update_state(y_batch_val, val_logits)
            val_acc = val_acc_metric.result()
            val_acc_metric.reset_states()
            print("Validation acc: %.4f" % (float(val_acc),))
        except Exception as e:
            continue

# Evaluate the model
test_loss, test_accuracy = model.evaluate(preprocessed_test_dataset)

0
投票

莱斯克鲁尔的回答救了我的命!我做了一个小修改,以防您自动想要删除不可用的图像:

from pathlib import Path
import imghdr
import os

# Define the directory containing the images

# List of valid image extensions
image_extensions = [".png", ".jpg", ".jpeg", ".bmp", ".gif"]

# Image types accepted by TensorFlow
img_type_accepted_by_tf = ["bmp", "gif", "jpeg", "png"]

# Loop through all files in the directory and subdirectories
for filepath in Path(extract_path).rglob("*"):
    # Check if it's a file before proceeding
    if filepath.is_file():
        # Check if the file has a valid image extension
        if filepath.suffix.lower() in image_extensions:
            # Check the actual image type
            img_type = imghdr.what(filepath)
            if img_type is None:
                print(f"{filepath} is not an image. Deleting...")
                os.remove(filepath)  # Delete the file
            elif img_type not in img_type_accepted_by_tf:
                print(f"{filepath} is a {img_type}, not accepted by TensorFlow. Deleting...")
                os.remove(filepath)  # Delete the file
        else:
            # If the file does not have a valid extension
            print(f"{filepath} is not a recognized image type. Deleting...")
            os.remove(filepath)  # Delete the file
© www.soinside.com 2019 - 2024. All rights reserved.