我正在寻找将蒙版(高度 x 宽度布尔图像)转换为一系列边界框(请参见下面的示例图片,我手绘)的方法,其中框包围“真理之岛”。
具体来说,我正在寻找一种适用于标准 TensorFlow 操作的方法(尽管欢迎所有输入)。我想要这个,这样我就可以将模型转换为 TFLite,而无需添加自定义操作并从源代码重新编译。但总的来说,了解执行此操作的不同方法会很好。
备注:
我已经有了一个涉及非标准 Tensorflow 的解决方案,基于 tfa.image.connected_components(请参阅解决方案此处)。然而该操作不包含在 Tensorflow Lite 中。它还感觉它做了一些比必要的稍微困难的事情(找到连接的组件比仅仅在图像上勾画出斑点而不担心它们是否连接更难)
我知道我没有在这里准确指定如何我想要生成的框(例如,单独的“阴阳式”连接组件是否应该有单独的框,即使它们重叠,等等)。真的,我并不担心细节,只是最终的盒子看起来“合理”。
一些相关问题(请在标记为重复之前阅读!):
将二进制掩码转换为张量流中的边界框要求创建一个单个边界框,这要容易得多。
从热图数据生成边界框(类似,但询问从“热图”转换的稍微更广泛的问题,并且不指定 Tensorflow)。
从图像标签创建边界框假设图像已经被分割成组件(在那里称为“标签”)
我理想地寻找不需要训练的东西(例如YOLO式回归)并且开箱即用(呵呵)。
编辑这是一个示例蒙版图像:https://github.com/petered/data/blob/master/images/example_mask3.png可以使用
将其加载到蒙版中mask = cv2.imread(os.path.expanduser('~/Downloads/example_mask3.png')).mean(axis=2) > 50
好吧,不确定这是否仅适用于张量流操作,但这是一个 Python/Numpy 实现(它使用非常低效的双 for 循环)。原则上,如果矢量化(再次不确定是否可能)或用 C 编写,它应该会很快,因为它只对像素进行 2 次传递来计算框。
我不确定这个算法是否有现有的名称,但如果没有,我会称之为“Downright Boxing”,因为它涉及向下和向右扩展掩码段以找到框。 这是问题中掩模的结果(添加了一些额外的形状作为示例):
def mask_to_boxes(mask: Array['H,W', bool]) -> Array['N,4', int]:
""" Convert a boolean (Height x Width) mask into a (N x 4) array of NON-OVERLAPPING bounding boxes
surrounding "islands of truth" in the mask. Boxes indicate the (Left, Top, Right, Bottom) bounds
of each island, with Right and Bottom being NON-INCLUSIVE (ie they point to the indices AFTER the island).
This algorithm (Downright Boxing) does not necessarily put separate connected components into
separate boxes.
You can "cut out" the island-masks with
boxes = mask_to_boxes(mask)
island_masks = [mask[t:b, l:r] for l, t, r, b in boxes]
"""
max_ix = max(s+1 for s in mask.shape) # Use this to represent background
# These arrays will be used to carry the "box start" indices down and to the right.
x_ixs = np.full(mask.shape, fill_value=max_ix)
y_ixs = np.full(mask.shape, fill_value=max_ix)
# Propagate the earliest x-index in each segment to the bottom-right corner of the segment
for i in range(mask.shape[0]):
x_fill_ix = max_ix
for j in range(mask.shape[1]):
above_cell_ix = x_ixs[i-1, j] if i>0 else max_ix
still_active = mask[i, j] or ((x_fill_ix != max_ix) and (above_cell_ix != max_ix))
x_fill_ix = min(x_fill_ix, j, above_cell_ix) if still_active else max_ix
x_ixs[i, j] = x_fill_ix
# Propagate the earliest y-index in each segment to the bottom-right corner of the segment
for j in range(mask.shape[1]):
y_fill_ix = max_ix
for i in range(mask.shape[0]):
left_cell_ix = y_ixs[i, j-1] if j>0 else max_ix
still_active = mask[i, j] or ((y_fill_ix != max_ix) and (left_cell_ix != max_ix))
y_fill_ix = min(y_fill_ix, i, left_cell_ix) if still_active else max_ix
y_ixs[i, j] = y_fill_ix
# Find the bottom-right corners of each segment
new_xstops = np.diff((x_ixs != max_ix).astype(np.int32), axis=1, append=False)==-1
new_ystops = np.diff((y_ixs != max_ix).astype(np.int32), axis=0, append=False)==-1
corner_mask = new_xstops & new_ystops
y_stops, x_stops = np.array(np.nonzero(corner_mask))
# Extract the boxes, getting the top-right corners from the index arrays
x_starts = x_ixs[y_stops, x_stops]
y_starts = y_ixs[y_stops, x_stops]
ltrb_boxes = np.hstack([x_starts[:, None], y_starts[:, None], x_stops[:, None]+1, y_stops[:, None]+1])
return ltrb_boxes
from skimage.measure import label, regionprops
# from skimage.morphology import label
mask_0 = cv2.imread('delete.png')
thresh = 127
mask_0 = cv2.threshold(mask_0, thresh, 255, cv2.THRESH_BINARY)[1]
mask_1 = mask_0[:,:,0]
lbl_0 = label(mask_1)
props = regionprops(lbl_0)
for prop in props:
print('Found bbox', prop.bbox)
cv2.rectangle(mask_0, (prop.bbox[1], prop.bbox[0]), (prop.bbox[3], prop.bbox[2]), (255, 0, 0), 2)
plt.imshow(mask_0)