在 python 中提取条形码以及图像或 pdf 文件中其他详细信息的最佳方法

Question

我有一项任务要求我以结构化格式从 pdf 或图像中提取订单详细信息。

我的方法是找到条形码区域，然后对每个条形码区域上方的文本进行 OCR。

最终输出应该是这样的

[
    {"style": "value", "art": "value", "color": "value", "size": "value", "barcode": "value"},
    {"style": "value", "art": "value", "color": "value", "size": "value", "barcode": "value"}
]

我尝试使用 pyzbar 库解码条形码。它准确地提取条形码，但绘制的边界不准确。这是解码并在条形码上绘制边框的结果

我也尝试过使用chat gpt视觉模型，但在这种情况下结果不稳定。

这是源图片：

Answer 1

让我们将条形码和文本视为“一种形式”。它由文本块和条形码块组成。

条形码可以很好地定位每个表格。另一种方法可以尝试定位固定文本（样式、艺术、颜色、尺寸）而不是条形码。不过，这会涉及更多。

方法：

找到条形码
逐步完成一些 2D 变换
获取文本块

您已经看到 pyzbar 给出了错误的边界框。当条形码不是直立而是旋转时，pyzbar 甚至可能检测不到它，即使检测到，边界框也没有方向。

我选择了 OpenCV 的

BarcodeDetector

。它的边界框并不完美，但已经足够好了。

不幸的是，即使包含解码步骤，该检测器也不会告诉我盒子的哪一端是“开始”。因此，如果您可以拥有上下颠倒的条形码，则必须提取两个可能的区域并进行测试。如果没有，那就没有。

结果可视化：

提取的文本框（左侧主要，右侧次要）：

代码：

# imports

import numpy as np
from numpy.linalg import norm, inv
import cv2 as cv

# utility functions

def normalize(vec):
    vec = np.asarray(vec)
    return vec / norm(vec)

def translate2(tx=0, ty=0):
    T = np.eye(3)
    T[0:2, 2] = (tx, ty)
    return T

def T_from_detection(detection):
    BL, TL, TR, BR = detection
    vx = normalize(TR - TL)
    vy = normalize(BL - TL)
    T = np.eye(3)
    T[0:2, 0] = vx
    T[0:2, 1] = vy
    T[0:2, 2] = TL
    return T

def polygon_from_wh(wh, dtype=None):
    (w, h) = wh
    return np.array([
        (0, 0),
        (w, 0),
        (w, h),
        (0, h)
    ], dtype=dtype)

def polygon_from_xywh(xywh):
    (x, y, w, h) = xywh
    return np.array([
        (x,     y),
        (x + w, y),
        (x + w, y + h),
        (x,     y + h)
    ])

# definitions
# consider each label as a form, with regions for the text and barcode

form_barcode_xy = (34, 138)

form_text_xywh = (0, 0, 400, 114)
text_wh = form_text_xywh[2:4]
text_polygon = polygon_from_wh(text_wh, dtype=np.float32)

T_form_text = translate2(*form_text_xywh[0:2])
T_form_barcode = translate2(*form_barcode_xy)

T_barcode_text = inv(T_form_barcode) @ T_form_text

# main code

im = cv.imread("f5SQQdP6-rotated.png", cv.IMREAD_GRAYSCALE)

det = cv.barcode.BarcodeDetector()
(rv, codes, detections, straights) = det.detectAndDecodeMulti(img=im)

for detection, code in zip(detections, codes):
    T_world_code1 = T_from_detection(detection)
    T_world_code2 = T_from_detection(detection[[2, 3, 0, 1]]) # shuffle for the other orientation

    T_world_text1 = T_world_code1 @ T_barcode_text
    T_world_text2 = T_world_code2 @ T_barcode_text

    text_im1 = cv.warpPerspective(im, T_world_text1, text_wh, flags=cv.WARP_INVERSE_MAP | cv.INTER_CUBIC)
    text_im2 = cv.warpPerspective(im, T_world_text2, text_wh, flags=cv.WARP_INVERSE_MAP | cv.INTER_CUBIC)

    imshow(text_im1, text_im2)

删除了所有可视化代码。这只是一堆绘图调用和一些将点转换为各种帧的

perspectiveTransform()

 调用。

在 python 中提取条形码以及图像或 pdf 文件中其他详细信息的最佳方法

问题描述投票：0回答：1

1个回答

最新问题

在 python 中提取条形码以及图像或 pdf 文件中其他详细信息的最佳方法

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1