我正在尝试编写一个 iOS 应用程序,它可以扫描文档以使用 Apple 的 Vision API 进行处理。目标是让实时视频源显示摄像头在屏幕上看到的内容,突出显示文档的轮廓并用半透明颜色填充,并让它在手机移动时跟踪视频,直到用户它按照他们喜欢的方式进行构图并拍摄快照。
为此,我创建了 3 件事:一个 AVCaptureVideoDataOutput 用于获取帧缓冲区进行分析,一个 AVCaptureVideoPreviewLayer 用于显示视频预览,以及一个 CAShapeLayer 用于显示轮廓/填充。一切似乎都很顺利——我得到了一个框架,在后台启动了一个 VNDetectDocumentSegmentationRequest 以快速获取候选四边形,然后在主 DispatchQueue 上抛出一个任务来更新形状层。
但是,坐标系并不对齐。根据捕获会话的预设,或者根据预览层的设备或屏幕尺寸,帧缓冲区和显示区域的坐标系都可能发生变化。我已经尝试了所有我能想到的变换组合,但我还没有找到神奇的公式。有人知道如何实现这一点吗?
这是一些代码...
我初始化输出/层:
detectionOutput = AVCaptureVideoDataOutput()
detectionOutput.alwaysDiscardsLateVideoFrames = true
detectionOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "sampleBufferQueue"))
if let captureSession = captureSession, captureSession.canAddOutput(detectionOutput) {
captureSession.addOutput(detectionOutput)
} else {
print("Capture session could not be established.")
return
}
videoPreviewLayer = AVCaptureVideoPreviewLayer(session: captureSession)
videoPreviewLayer.frame = view.layer.bounds
videoPreviewLayer.videoGravity = .resizeAspectFill
view.layer.addSublayer(videoPreviewLayer)
documentOverlayLayer = CAShapeLayer()
documentOverlayLayer.frame = videoPreviewLayer.frame
documentOverlayLayer.strokeColor = UIColor.red.cgColor
documentOverlayLayer.lineWidth = 2
documentOverlayLayer.fillColor = UIColor.clear.cgColor
videoPreviewLayer.addSublayer(documentOverlayLayer)
然后我像这样捕获帧缓冲区:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
let ciImage = CIImage(cvPixelBuffer: pixelBuffer)
detectDocument(in: ciImage, withOrientation: exifOrientationFromDeviceOrientation())
}
像这样检测文档:
private func detectDocument(in image: CIImage, withOrientation orientation: CGImagePropertyOrientation) {
let requestHandler = VNImageRequestHandler(ciImage: image, orientation: orientation, options: [:])
let documentDetectionRequest = VNDetectDocumentSegmentationRequest { [weak self] request, error in
DispatchQueue.main.async {
guard let self = self else { return }
guard let results = request.results as? [VNRectangleObservation],
let result = results.first else {
// No results
self.detectedRectangle = nil
self.documentOverlayLayer.path = nil
return
}
if result.confidence < 0.5 {
// Too low confidence
self.detectedRectangle = nil
self.documentOverlayLayer.path = nil
}
else {
self.detectedRectangle = result
self.drawRectangle(result, inBounds: image.extent)
}
}
}
然后尝试像这样预览它:
private func drawRectangle(_ rectangle: VNRectangleObservation, inBounds: CGRect) {
let xScale = videoPreviewLayer.frame.width * videoPreviewLayer.contentsScale
let yScale = videoPreviewLayer.frame.height * videoPreviewLayer.contentsScale
// Transforming Vision coordinates to UIKit coordinates
// HELP!!! Despite all kinds of combinations of outputRectConverted, layerRectConverted, manually-created transforms or others, I can't get the rectangles to consistently line up with the image...
let topLeft = CGPoint(x: rectangle.topLeft.x * xScale, y: (1 - rectangle.topLeft.y) * yScale)
let topRight = CGPoint(x: rectangle.topRight.x * xScale, y: (1 - rectangle.topRight.y) * yScale)
let bottomLeft = CGPoint(x: rectangle.bottomLeft.x * xScale, y: (1 - rectangle.bottomLeft.y) * yScale)
let bottomRight = CGPoint(x: rectangle.bottomRight.x * xScale, y: (1 - rectangle.bottomRight.y) * yScale)
// Create a UIBezierPath from the transformed points
let path = UIBezierPath()
path.move(to: topLeft)
path.addLine(to: topRight)
path.addLine(to: bottomRight)
path.addLine(to: bottomLeft)
path.close()
DispatchQueue.main.async {
self.documentOverlayLayer.path = path.cgPath
}
}
好吧,一天后,我在 Apple 的 使用显着性突出显示图像中感兴趣的区域示例代码的帮助下找到了答案。
诀窍是:
也就是说,这是预览图层的新视图初始化代码:
let layer = AVCaptureVideoPreviewLayer(session: captureSession)
layer.frame = view.layer.bounds
layer.videoGravity = .resizeAspectFill
let highlightColor = UIColor.red.cgColor
documentOverlayLayer.strokeColor = highlightColor
documentOverlayLayer.lineWidth = 3
documentOverlayLayer.fillColor = highlightColor.copy(alpha: 0.5)
layer.addSublayer(documentOverlayLayer)
view.layer.addSublayer(layer)
videoPreviewLayer = layer
这是布局更新代码(previewTransform 存储为实例变量):
override func viewDidLayoutSubviews() {
updateLayersGeometry()
super.viewDidLayoutSubviews()
}
func updateLayersGeometry() {
if let baseLayer = videoPreviewLayer {
// Align overlay layer with video content rect
let outputRect = CGRect(x: 0, y: 0, width: 1, height: 1)
let videoRect = baseLayer.layerRectConverted(fromMetadataOutputRect: outputRect)
documentOverlayLayer.frame = videoRect
// transform to convert from normalized coordinates to layer's coordinates
let scaleT = CGAffineTransform(scaleX: documentOverlayLayer.bounds.width, y: -documentOverlayLayer.bounds.height)
let translateT = CGAffineTransform(translationX: 0, y: documentOverlayLayer.bounds.height)
previewTransform = scaleT.concatenating(translateT)
}
}
然后在生成预览矩形时,我只需使用原始矩形坐标调用 UIBezierPath 并在最后调用
path.apply(transform)
,它就与预览坐标匹配了!
private func createRectanglePath(_ rectangle: VNRectangleObservation,
transform: CGAffineTransform) -> CGPath {
let path = UIBezierPath()
path.move(to: rectangle.topLeft)
path.addLine(to: rectangle.topRight)
path.addLine(to: rectangle.bottomRight)
path.addLine(to: rectangle.bottomLeft)
path.close()
path.apply(transform)
return path.cgPath
}