Recognize Shapes in picture

Is there a way to recognize a shape in an picture object?

Say, I take a picture of an ID card on a solid background. Is there a plugin or other xcode-only way to recognize the shape of this card? If the card is slightly in perspective or rotated, due to the way I took the picture or scanned the card, can I detect it somehow? (So I can straighten it out)

What do you mean with “blocks left”?

I actually want this process to be automated. Finding corners is tough. And recognizing a shape in pure pixels is even harder.

I would think that there must be some kind of OCR plugin that has some sort of algorithm that can do the job…

[quote=218529:@dave duke]blocks i mean pixels. for example if you have a picture 1024x768, there are a lot of pixes and hard to find an edge.
if you reduce it to 100x76 and you can still see the shape of the card your looking for, then reduce the colors until you can still see the card.
Then whats left should be fairly easy to scan though looking for edges.


I see… If I can roughly find the image, I need to know on what scale I am working. I think that my approach would be

  • reduce the resolution (and remember the scaling factor)
  • maybe apply a blur (to avoid dust detection) and increase contrast
  • find the edges
  • find the places where significant turns are made (the corners of the card)
  • multiply the edge coordinates with the scaling factor

That way the borders match with the original image, right?

Well, that’s why I love this forum. Good apps are not always build by just one individual. I’m gonna try some of your thoughts. I will include you in the credits, if my app ever sees a broader audience :wink:

Maybe Alain’s post will help you too. On the initial scan he uses some image processing to find the shapes. I don’t know how you would use a transformation matrix to normalize a shape, but it’s certainly possible:

In OS X there’s several ways to do this with Core Image.

Don’t be fooled by the Text detector as it only recognizes the area where the text is, not the actual text…

There’s also an older method where you can use CIRowAverage and CIColumnAverage to create an image map and with some math, you can figure in the image where a shape is. Some of the older WWDC videos on Core Image show you how to utilize this.

You’ll find here a port of Computer Vision library for Xojo with some functions for shape recognition.

what’s your means about the shape in the pictrue? if it’s the text shape, you can try this free online ocr tool to help you recognize the text from image. if it’s the graphics shape like rectangle, maybe you can try the photoshop image editor.

Can be done with OpenCV, I’ve been trying to install OpenCV on Mac, several links on the Internet, but just cannot get it installed correctly, maybe this link may be informative…

If you are able to install it correctly I would be grateful for the instructions.


I’ve actually done this for a customer. The trick is to make sure the solid background is so specifically different from what you are scanning that you can easily figure out what is and is not the background.

Once you have that, detecting the edges is relatively easy if the item is rectangular (rounded corners will add a bit of a challenge). You “scan in” from each of the edges looking for a non-background colored pixel and record it’s position. Find all four and you’ve got a rectangle. From two adjacent pixels (and some high-school math) you can figure out the angle, and rotate the image accordingly.

Ultimately what we found was that it was better for the user to put the item in the scanner crooked rather than striving for being straight. The closer to 45° they got, the better it worked.

If instead of scanning in from the edges using horizontal and vertical scanning, you scanned in from the corners using diagonals would that have worked better for almost straight images ?

No. We scanned in from the edges to find the first pixel. The issue is that the chances of a user getting the image perfectly straight every time is nearly zero, so we opted for it to never be straight.

Think of a rectangle that is at a 20° angle. You scan downward from the top and the first non-background pixel that you find is actually a corner. Do that for all four sides. If your target is ID cards, you can reasonably assume that the item should be wider than it is tall, so figuring out the angle that you need is either n or n+90°.

Yeah I see what you did, but if you scanned in from the corners using diagnonal sweaps rather than horizontal or vertical (more trickey to code ) you have a much higher chance of getting the four corners, unless the card is at 45 degrees of course…
Interesting problem.

[quote=354630:@Hamish Steiner]Yeah I see what you did, but if you scanned in from the corners using diagnonal sweaps rather than horizontal or vertical (more trickey to code ) you have a much higher chance of getting the four corners, unless the card is at 45 degrees of course…
Interesting problem.[/quote]
When we did the experiments (and we did a lot of them), horizontal & vertical gave us the best bang for the buck. For us, scanning pixels in a straight line ended up being about 15% faster than diagonally. It may not seem like a lot when the overall scan takes less than ½ second, but it adds up throughout the day when a person scans 1000 items a day.