如何使用 Tesseract（模式 --psm 2）仅进行页面分割/布局检测？

Question

我想使用 Tesseract 中的页面分割而不运行 OCR，因为我有自己的自定义 OCR 模型，并且运行页面分割和 OCR 需要很长时间。我尝试在 Tesseract 的命令行模式和 pytesseract 中使用

--psm 2

模式，但它没有按承诺工作。

我在 Linux 中工作，并使用 Python 3.10 进行编码。

我目前使用layoutparser 文档中的tesseract-ocr-api。代码如下所示：

import layoutparser as lp
ocr_agent = lp.TesseractAgent()
res = ocr_agent.detect(img_path, return_response=True)
layout_info = res['data']

layout_info

是一个 pd.DataFrame，包含块、段落、行和单词级别的布局信息以及 OCR 输出。问题是这非常慢；在我的机器上，每个图像需要 7 秒，而且我实际上不需要 OCR。因此，我只需要页面分割（有时也称为布局检测）。

根据 Tesseract（Documentation），有

--psm 2

模式“自动页面分割，但没有 OSD 或 OCR”。当我在命令行中尝试此操作时，这不会生成输出文件（即使定义了输出类型）：

tesseract img.png outfile --psm 2
tesseract img.png outfile --psm 2 tsv

我也尝试使用 python 包装器

pytesseract

，但它非常慢，并且它再次返回带有布局和 OCR 数据的 pd.DataFrame，尽管指定了 --psm 2：

import cv2
import pytesseract

img = cv2.imread(img_path)
layout_info = pytesseract.image_to_data(img, config='tsv --psm 2', output_type='data.frame')

我正在使用 pytesseract==0.3.10 和 tesseract 5.3.3-30-gea0b。

您对如何使用 Tesseract 实现不使用 OCR 的页面分割（或者至少加快页面分割 + OCR 的处理时间）有什么想法吗？

Answer 1

您可以使用以下命令检查 -psm2 是否在您的超立方体中实现：

tesseract --help-psm 2

我的机器上的输出：

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR. (not implemented)
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
       bypassing hacks that are Tesseract-specific.

为您提供信息：

--psm 2

自动页面分割，但没有 OSD 或 OCR。 （未实施）

因此你不能使用它。

如何使用 Tesseract（模式 --psm 2）仅进行页面分割/布局检测？

问题描述投票：0回答：1

1个回答

最新问题

如何使用 Tesseract（模式 --psm 2）仅进行页面分割/布局检测？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1