使用 pytesseract OCR 从验证码中读取文本

问题描述 投票:0回答:1

我需要使用 Pytesseract 从这张图片中提取文本:

input image

我正在使用此代码:

import pytesseract
import cv2
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
path = "C:\\Users\\User\\Desktop\\guvenlik.jpg"
src = cv2.imread(path)
img = cv2.cvtColor(src, cv2.COLOR_BGR2BGRA)
text = pytesseract.image_to_string(img)
print(text)

但是,我的输出不匹配:

我的控制台输出:72660wib

如何阅读此类文字?

python ocr tesseract captcha python-tesseract
1个回答
0
投票

不确定这是否适用于所有类似的验证码,但这适用于该图像。

转换此图像: Captcha Input Image

致:Captcha Output Image

然后尝试使用附加的超立方体选项提取文本

--oem 1 --psm 8 -c tessedit_char_whitelist=0123456789abcdefghijklmnopqrstuvwxyz

import pytesseract
import cv2
import numpy as np
try:
    import Image
except ImportError:
    from PIL import Image, ImageEnhance
from io import BytesIO

pytesseract.pytesseract.tesseract_cmd = r".\Tesseract-OCR\tesseract.exe"    

def enhance_and_resolve(path):
    img = cv2.imread(path, 0)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2BGRA)
    kernel = np.ones((2,2), np.uint8)
    img = cv2.dilate(img, kernel, iterations=1)
    img = cv2.erode(img, kernel, iterations=1)
    img = cv2.GaussianBlur(img,(5,5),0)
    # Uncomment line to see final output of enhancement.
    #cv2.imwrite("Captcha.png",img)
    custom_config = r'--oem 1 --psm 8 -c tessedit_char_whitelist=0123456789abcdefghijklmnopqrstuvwxyz'
    return pytesseract.image_to_string(img, config=custom_config)


if __name__ == '__main__':
    path = r".\test_img.jpg"
    text = enhance_and_resolve(path)
    print(text)
最新问题
© www.soinside.com 2019 - 2025. All rights reserved.