Segment documents in images for Xojo

Christian_Schmitz · October 24, 2024, 1:35pm

For the next plugin version we add the VNDetectDocumentSegmentationRequestMBS class for macOS and iOS to detect a document in a picture and rectify this. This is used internally for VNDocumentCameraScanMBS class on iOS, but now can be used independently.

To give you an example. You may have the picture on the left and use the function to get the picture on the right side.

The function uses machine learning to determinate the picture, process it on machine and provide a new picture. The area with the picture is detected, a transformation matrix is created and a new picture made with the transformation.

Let us show you the code for an example:

// load the image
Var image As New CIImageMBS(Item)

// we ask the image request handler above to do the document detection
Var request As New VNDetectDocumentSegmentationRequestMBS
Var requests() As VNRequestMBS
requests.add request

// run it synchronously
Var error As NSErrorMBS
Var imageRequestHandler As VNImageRequestHandlerMBS = VNImageRequestHandlerMBS.RequestWithCIImage(image)
Var Success As Boolean = imageRequestHandler.performRequests(requests, error)

// check results to get observation
Var results() As VNObservationMBS = request.results
Var result As Variant = results(0)
Var r As VNRectangleObservationMBS = result

// now we need to scale the box to the image size as box is normalized 0 to 1
Var w As Double = image.Width
Var h As Double = image.Height

Var boundingBox As CGRectMBS = r.boundingBox.multiply(w,h)
Var cf As New CIFilterCropMBS
cf.inputImage = image
cf.inputRectangle = CIVectorMBS.vectorWithCGRect(boundingBox)

// now we strech the area found to the output size
Var f As New CIFilterPerspectiveCorrectionMBS
f.inputImage = cf.outputImage
f.inputBottomLeft = CIVectorMBS.vectorWithCGPoint(r.bottomLeft.multiply(w,h))
f.inputBottomRight = CIVectorMBS.vectorWithCGPoint(r.bottomRight.multiply(w,h))
f.inputTopLeft = CIVectorMBS.vectorWithCGPoint(r.topLeft.multiply(w,h))
f.inputTopRight = CIVectorMBS.vectorWithCGPoint(r.topRight.multiply(w,h))

// check output and show in window
Var outputImage As CIImageMBS = f.outputImage
Var pic As Picture = outputImage.RenderPictureWithAlpha

Please try with the 24.5 plugins soon and see if you can use this. Maybe combine it with using other vision functions to recognise the text or barcodes in the image.