OCR字符识别失败

Question

我正在尝试人工智能，特别是字符识别。我发现最好的算法之一是 OCR，而 Google 在 Tesseract 中的实现似乎是目前最好的开源解决方案。所以，我得到了一些图像，并尝试使用 python 将 Tesseract 应用于它们，本例中的图像是车牌，所以我不会显示原始图像，但经过一些预处理后，我最终得到以下结果：这看起来是一个非常简单的图像来获取文本，但我总是得到 624830，所以 830 很好，但 624 应该是 GZX。

这是我的代码：

import pytesseract
import cv2

# Opening the image
img = cv2.imread("plate.jpg")
# Preprocess (Better performance)
w,h,_ = img.shape
img = cv2.resize(img, (w*2,h*2))
img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
img = cv2.medianBlur(img,5)
img = cv2.threshold(img,0,255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]
# Get a slice of the image with just the text to improve performance
box = []
data = pytesseract.image_to_data(img, config=r'--oem 3', output_type=pytesseract.Output.DICT)
n_boxes = len(data['level'])
for i in range(n_boxes):
    (x, y, w, h) = (data['left'][i], data['top'][i], data['width'][i], data['height'][i])
    if int(data['conf'][i]) > 0:
        box.append((x, y, x + w, y + h))

h, w = img.shape
for box in box:
    x_min, y_min, x_max, y_max = box
    # Add some padding ensuring we do not get out of the image
    x_min_pad = max(0, x_min - 5)
    y_min_pad = max(0, y_min - 5)
    x_max_pad = min(w, x_max + 5)
    y_max_pad = min(h, y_max + 5)
    # Get the slice
    slice = img[y_min_pad:y_max_pad, x_min_pad:x_max_pad]
# Inference with LSTM on the slice
print(pytesseract.image_to_string(slice, config='--psm 8 --oem 3'))

所有完成的预处理都经过测试，所以我知道它确实提高了性能，如果我删除它甚至无法检测到正确的数字。

Answer 1

问题似乎是您要识别的文本使用了拉伸字体（并且由于透视而扭曲）

我设法通过在处理切片图像之前将其压缩（即保留每隔一个像素行）来使其（有点）工作，使用以下参数：

--psm 8 --oem 3

import pytesseract
import cv2

# Opening the image
img = cv2.imread("plate.jpg")
# Preprocess (Better performance)
w,h,_ = img.shape
img = cv2.resize(img, (w*2,h*2))
img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
img = cv2.medianBlur(img,5)
img = cv2.threshold(img,0,255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]

# Get a slice of the image with just the text to improve performance
box = []
data = pytesseract.image_to_data(img, config=r'--oem 3', output_type=pytesseract.Output.DICT)
n_boxes = len(data['level'])
for i in range(n_boxes):
    (x, y, w, h) = (data['left'][i], data['top'][i], data['width'][i], data['height'][i])
    if int(data['conf'][i]) > 0:
        box.append((x, y, x + w, y + h))

h, w = img.shape
for box in box:
    x_min, y_min, x_max, y_max = box
    # Add some padding ensuring we do not get out of the image
    x_min_pad = max(0, x_min - 5)
    y_min_pad = max(0, y_min - 5)
    x_max_pad = min(w, x_max + 5)
    y_max_pad = min(h, y_max + 5)
    # Get the slice
    slice = img[y_min_pad:y_max_pad, x_min_pad:x_max_pad]

# Inference with LSTM on the slice
slice = slice[::2, :]
print(pytesseract.image_to_string(slice, config='--psm 8 --oem 3'))

输出是：

|GZX 830]

考虑到我们可以在侧面看到如此多的残留字符，这还远远不够完美。我相信如果你纠正视角（通过进行边缘检测和标准使用

cv2.getPerspectiveTransform

，然后

cv2.warpPerspective

）

，这会更干净

OCR字符识别失败

问题描述投票：0回答：1

1个回答

最新问题

OCR字符识别失败

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1