从访问卡图像中提取带有坐标的标志和文字。

问题描述 投票:-1回答:1

我有一张访问卡,我想从访问卡中获取带有坐标的标识和所有文字,这样我就可以在HTML画布上对上传的图片进行编辑,我看过很多例子,但我找不到准确的例子,我只找到了从图片中获取文字的方法,我也试过用Google Vision API,但它也只给了文字。

这里是一个示例图像。

enter image description here

在下面的代码中,我必须选择要提取的标志,我需要它自动找到并提取。

# import the necessary packages
import argparse
import cv2

# initialize the list of reference points and boolean indicating
# whether cropping is being performed or not
ref_point = []
cropping = False

def shape_selection(event, x, y, flags, param):
  # grab references to the global variables
  global ref_point, cropping

  # if the left mouse button was clicked, record the starting
  # (x, y) coordinates and indicate that cropping is being
  # performed
  if event == cv2.EVENT_LBUTTONDOWN:
    ref_point = [(x, y)]
    cropping = True

  # check to see if the left mouse button was released
  elif event == cv2.EVENT_LBUTTONUP:
    # record the ending (x, y) coordinates and indicate that
    # the cropping operation is finished
    ref_point.append((x, y))
    cropping = False

    # draw a rectangle around the region of interest
    cv2.rectangle(image, ref_point[0], ref_point[1], (0, 255, 0), 2)
    cv2.imshow("image", image)

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="Path to the image")
args = vars(ap.parse_args())

# load the image, clone it, and setup the mouse callback function
image = cv2.imread(args["image"])
clone = image.copy()
cv2.namedWindow("image")
cv2.setMouseCallback("image", shape_selection)

# keep looping until the 'q' key is pressed
while True:
  # display the image and wait for a keypress
  cv2.imshow("image", image)
  key = cv2.waitKey(1) & 0xFF

  # if the 'r' key is pressed, reset the cropping region
  if key == ord("r"):
    image = clone.copy()

  # if the 'c' key is pressed, break from the loop
  elif key == ord("c"):
    break

# if there are two reference points, then crop the region of interest
# from teh image and display it
if len(ref_point) == 2:
  crop_img = clone[ref_point[0][1]:ref_point[1][1], ref_point[0][0]:ref_point[1][0]]
  cv2.imshow("crop_img", crop_img)
  cv2.waitKey(0)

# close all open windows
cv2.destroyAllWindows()
python opencv ocr html2canvas google-vision
1个回答
1
投票

你可以试试ABBYY云API。

https:/www.abbyy.comen-gbcloud-ocr-sdkfeatures

API会让你得到带有坐标的文本,你也可以得到图像元素--只要是可检测的--也是纯图像。有了一些逻辑,你可以把这些组合在一起,得到一个文档,其中包括所有的文本元素作为真实的文本,所有的图像作为图像在正确的位置。

但是请记住,在OCR开始之前,对图像进行了一些预处理。这意味着图像的质量可能已经改变。因此,通过使用从API中获得的坐标,从原始扫描中提取图像部分可能是个好主意。

https:/www.ocrsdk.comdocumentationspecificationsexport-formats

API真的很好,给你提供的OCR结果和google的cloud-vision非常相似。而且你有更多的功能和参数来调整结果。但是ABBYY的API比google的API要贵很多。

© www.soinside.com 2019 - 2024. All rights reserved.