我的图像中有一个奇怪的输出:所有字符都以灰色像素为界。我确信90%是因为OpenCV-PIL转换问题,但我不知道如何解决它。
这是源图像:
和输出(你需要缩放以查看灰色像素..)
这里的细节..
这是我正在使用的代码:
import cv2
import tesserocr as tr
from PIL import Image
import os
src = (os.path.expanduser('~\\Desktop\\output4\\'))
causali = os.listdir(src) # CREO LISTA CAUSALI
causali.sort(key=lambda x: int(x.split('.')[0]))
for file in enumerate(causali): # CONTA NUMERO DI FILE CAUSALE
cv_img = cv2.imread(os.path.expanduser('~\\Desktop\\output4\\{}'.format(file[1])), cv2.IMREAD_UNCHANGED)
# since tesserocr accepts PIL images, converting opencv image to pil
pil_img = Image.fromarray(cv2.cvtColor(cv_img, cv2.COLOR_BGR2RGB))
# initialize api
api = tr.PyTessBaseAPI()
try:
# set pil image for ocr
api.SetImage(pil_img)
# Google tesseract-ocr has a page segmentation method(psm) option for specifying ocr types
# psm values can be: block of text, single text line, single word, single character etc.
# api.GetComponentImages method exposes this functionality
# function returns:
# image (:class:`PIL.Image`): Image object.
# bounding box (dict): dict with x, y, w, h keys.
# block id (int): textline block id (if blockids is ``True``). ``None`` otherwise.
# paragraph id (int): textline paragraph id within its block (if paraids is True).
# ``None`` otherwise.
boxes = api.GetComponentImages(tr.RIL.BLOCK, True)
# get text
text = api.GetUTF8Text()
# iterate over returned list, draw rectangles
for (im, box, _, _) in boxes:
x, y, w, h = box['x'], box['y'], box['w'], box['h']
cv_rect = cv2.rectangle(cv_img, (x-10, y-10), (x + w+10, y + h+10), color=(255, 255, 255), thickness=1)
im.save(os.path.expanduser('~\\Desktop\\output5\\{}.png').format(file[0]))
finally:
api.End()
有没有办法接受api.SetImage()
一个opencv变量?
谢谢
编辑:有没有办法通过提供他们的颜色删除所有灰色像素?
所以,这是我的解决方案。找到一种方法来使用OpenCV而不是PIL,只要第一个不在此过程中将图像转换为JPEG。我们将从输入到输出有一个干净的图像。
这是代码:
import cv2
import tesserocr as tr
from PIL import Image
import os
cv_img = cv2.imread('C:\\Users\\Link\\Desktop\\0.png', cv2.IMREAD_UNCHANGED)
idx = 0
# since tesserocr accepts PIL images, converting opencv image to pil
pil_img = Image.fromarray(cv_img)
# initialize api
api = tr.PyTessBaseAPI()
try:
# set pil image for ocr
api.SetImage(pil_img)
# Google tesseract-ocr has a page segmentation method(psm) option for specifying ocr types
# psm values can be: block of text, single text line, single word, single character etc.
# api.GetComponentImages method exposes this functionality
# function returns:
# image (:class:`PIL.Image`): Image object.
# bounding box (dict): dict with x, y, w, h keys.
# block id (int): textline block id (if blockids is ``True``). ``None`` otherwise.
# paragraph id (int): textline paragraph id within its block (if paraids is True).
# ``None`` otherwise.
boxes = api.GetComponentImages(tr.RIL.TEXTLINE, True)
# get text
text = api.GetUTF8Text()
# iterate over returned list, draw rectangles
for (im, box, _, _) in boxes:
x, y, w, h = box['x'], box['y'], box['w'], box['h']
cv_rect = cv2.rectangle(cv_img, (x-10, y-10), (x + w+10, y + h+10), color=(255, 255, 255), thickness=1)
roi = cv_rect[y:y + h, x:x + w]
cv2.imwrite(os.path.expanduser('~\\Desktop\\output5\\image.png'), roi)
finally:
api.End()