Mask RCNN 预测不佳

Question

我正在使用 Matterport MRCNN 存储库的一个版本对使用

mask_rcnn_coco.h5

权重进行迁移学习的 ~5k 图像（1024x1024）训练的 2 个对象（比如狗和猫）进行分类。我想知道我是否做错了什么，需要更改配置参数，调整我的图像数据集，还是这些努力的某种组合。在我训练模型并加载权重后，当我尝试在测试集图像上进行预测时，边界框将位于图像顶部边缘的某个位置，对于我尝试的每个测试集图像，框坐标如

[   0 1801    1 1803]

，这表明预测不佳。

我对此进行了广泛研究，但我找不到适合我情况的问题和解决方案，尽管我在想也许我需要尝试一种不同的方法来加载 COCO 权重或将更多无对象图像添加到训练/验证集中。然而，我发现一些关于 MRCNN 实现的论文使用的训练/验证图像比我没有包含无对象图像时使用的更少，并且能够毫无问题地进行预测。有没有人有什么建议？谢谢。

同一图像推理过程中的随机预测

Mask RCNN 的预测无法正常工作

这是我到目前为止的相关代码（

tensorflow == 2.10.0

）：

# define a configuration for the model
class PetConfig(Config):
    # define the name of the configuration
    NAME = "pet_cfg"
    # number of classes (background + 2 detected classes (Cat + Dog))
    NUM_CLASSES = 1 + 2
    # number of training steps per epoch
    STEPS_PER_EPOCH = 100
    IMAGES_PER_GPU = 1
    MAX_GT_INSTANCES = 3
    DETECTION_MAX_INSTANCES = 3
    DETECTION_MIN_CONFIDENCE = 0.95
    BACKBONE = "resnet50"
    TRAIN_ROIS_PER_IMAGE = 10
    VALIDATION_STEPS = 200

配置概要：


Configurations:
BACKBONE                       resnet50
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     1
BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE         None
DETECTION_MAX_INSTANCES        3
DETECTION_MIN_CONFIDENCE       0.95
DETECTION_NMS_THRESHOLD        0.3
FPN_CLASSIF_FC_LAYERS_SIZE     1024
GPU_COUNT                      1
GRADIENT_CLIP_NORM             5.0
IMAGES_PER_GPU                 1
IMAGE_CHANNEL_COUNT            3
IMAGE_MAX_DIM                  1024
IMAGE_META_SIZE                15
IMAGE_MIN_DIM                  800
IMAGE_MIN_SCALE                0
IMAGE_RESIZE_MODE              square
IMAGE_SHAPE                    [1024 1024    3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.001
LOSS_WEIGHTS                   {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE                 14
MASK_SHAPE                     [28, 28]
MAX_GT_INSTANCES               3
MEAN_PIXEL                     [123.7 116.8 103.9]
MINI_MASK_SHAPE                (56, 56)
NAME                           pet_cfg
NUM_CLASSES                    3
POOL_SIZE                      7
POST_NMS_ROIS_INFERENCE        1000
POST_NMS_ROIS_TRAINING         2000
PRE_NMS_LIMIT                  6000
ROI_POSITIVE_RATIO             0.33
RPN_ANCHOR_RATIOS              [0.5, 1, 2]
RPN_ANCHOR_SCALES              (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE              1
RPN_BBOX_STD_DEV               [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD              0.7
RPN_TRAIN_ANCHORS_PER_IMAGE    256
STEPS_PER_EPOCH                100
TOP_DOWN_PYRAMID_SIZE          256
TRAIN_BN                       False
TRAIN_ROIS_PER_IMAGE           10
USE_MINI_MASK                  True
USE_RPN_ROIS                   True
VALIDATION_STEPS               200
WEIGHT_DECAY                   0.0001

config = PetConfig()
from keras import backend as K
K.clear_session()                   # disallocate GPU memory
# define the model
model = MaskRCNN(mode='training', model_dir="logs", config=config)
# Set early stopping after certain number of epochs
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
# load weights (mscoco) and exclude the output layers
model.load_weights("mask_rcnn_coco/mask_rcnn_coco.h5", by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc",  "mrcnn_bbox", "mrcnn_mask"])
# train weights (output layers or 'heads')
model.train(train_set, 
            val_set, 
            learning_rate=0.001, 
            epochs=30,
            custom_callbacks=[callback],
            layers='heads')

训练后的保存和推理：

model.keras_model.save_weights('weights/pet_mask_rcnn.h5')

from matplotlib.patches import Rectangle

# define the prediction configuration
class PredictionConfig(Config):
    # define the name of the configuration
    NAME = "pet_cfg"
    # number of classes (background + 2 detected classes (dog + cat))
    NUM_CLASSES = 1 + 2
    # simplify GPU config
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1
    MAX_GT_INSTANCES = 1
    DETECTION_MAX_INSTANCES = 1
    
predcfg = PredictionConfig()

# Load model in inference mode
K.clear_session()                   # disallocate GPU memory
inf_mod = MaskRCNN(mode="inference", model_dir='weights/', config=predcfg)
inf_mod.load_weights('weights/pet_mask_rcnn.h5', by_name=True)

#Test on a single image
num=random.randint(0, len(test_set.image_ids))
# define image id
image_id = num
# load the image
test_img = test_set.load_image(image_id)

detected = inf_mod.detect([test_img])[0]
pyplot.imshow(test_img)
ax = pyplot.gca()
class_names = ['dog', 'cat']
class_id_counter=1
for box in detected['rois']:
    print(box)
#get coordinates
    detected_class_id = detected['class_ids'][class_id_counter-1]
    #print(detected_class_id)
    #print("Detected class is :", class_names[detected_class_id-1])
    y1, x1, y2, x2 = box
    #calculate width and height of the box
    width, height = x2 - x1, y2 - y1
    #create the shape
    ax.annotate(class_names[detected_class_id-1], (x1, y1), color='black', weight='bold', fontsize=10, ha='center', va='center')
    rect = Rectangle((x1, y1), width, height, fill=False, color='red')
#draw the box
    ax.add_patch(rect)
    class_id_counter+=1
#show the figure
pyplot.show()

Answer 1

做更多的研究，我发现这篇文章与 Mask RCNN 的误报非常相似。

减少 tensorflow-mask-rcnn 中的误报的最佳方法是什么...

根据 Stigma 的回答，我认为我需要从头开始训练并添加更多非对象图像进行训练。

Mask RCNN 预测不佳

问题描述投票：0回答：1

1个回答

最新问题

Mask RCNN 预测不佳

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1