This week I decided to spend some time playing with something a little different on my n900. Namely optical character recognition. This was inspired by a demo by Cybercomchannel called phototranslator. It looks cool and I'm looking forward to them making it availiable for people to try. However I am not a patient man... So considering they mentioned they simply used Tesseract I figured I could just have a go myself.
I required no particularly special skills to do this, I already had a fremantle scratchbox environment setup, even though I don't really need it for Witter. So I downloaded tesseract into scratchbox, did a ./configure, make, make install and presto it built no problem. Then I realised it only works on tif images, but the n900 camera spits out jpgs. After a short search I found convert from the Imagemagik tools.
Another simple download and compile and I was now able to convert jpg to tif. I copied the files accross, and quickly found the libraries that also needed copying as they weren't initially found.
Tools in hand I knocked up a simple script to tie them together.
ocr.sh: export LD_LIBRARY_PATH=LD_LIBRARY_PATH:/usr/local/lib echo "converting image" convert $1.jpg /tmp/$1.tif echo "recognising text" tesseract /tmp/$1.tif /home/user/MyDocs/$1 echo "text written to /home/user/MyDocs/" rm /tmp/$1.tif leafpad /home/user/MyDocs/$1.txt
first exports the library path to pick up where I put the imagemagik libs. Then converts to a temp file, before running tesseract, and finally launching leafpad with the output.
This isn't a slick script, but does mean I can just have a terminal open in the /home/user/MyDocs/DCIM folder and run ocr.sh 20100307_001 note no .jpg extension makes it easier for the script to handle without and messing around.
I've had a lot of problems with convert crashing out failing to perform the conversion. Not sure why, but normally modifying brightness/contrast in the source is enough to make it work. Sometimes I have to specify the -monochrome option on convert. So far I've not failed to be able to convert an image, it just sometimes takes more tries than I'd like.
Some examples of it in action. Test1 - glossy magazine text source image: Note the flash reflection, I was careful to keep this away from the text I wanted to OCR. Cropped image: It's important to keep the image as cropped as possible to the text to be recognised.
Adjusted for brightness & contrast I found on images like this it's helpful to turn up the contrast and turn the brightness down.
Resulting ocr text: "A Each will accept a 5/8" shanl< tool. But Sovereign is not juétra lighdle it is a total system. It comes complete with 3/8" fand l/2" collet adaptors allowing tools with those shank . diameters to be fitted securely. That means it will take an array of spindle and bowl gouges; To add to the versatility we have also adapted a couple of highly popular hollowing tools- the hollowmaster and multi tip hollowing t0ol.»Ihese are now _ _ available in three lengths and without handle to make the i Sovereign System one very practical and it"
As you can see it's not perfect, but really pretty good. The process is not that lengthy either. Perhaps 20 seconds for convert and ocr to run.
Then I tried some plain black text on white background
Which i also adjusted for brightness/contrast
Which got these results: "The Championships Wimbledon 20l0 The Wingheld Restaurant is the only bookable, waiter-served restaurant offering a 3·course lunch with wine and mineral water for £6O per person, including service. To make a reservation for a maximum of six guests per table, piease visit our website www.fmccatcrlng.c¤.uk and click on "Food and Drink atWimbledon". The Reservations page will own from Monday l5th February 20lO. Your reservation will be confirmed once you have completed the on·line booking and payment form and operates on a striittiy first come first served basis.After the transaction is complete, you will receive a confirmation email which you shouid keep safe and bring it with you on the day of your reservation. if you do not have access to the internet, we will still accept reservations by fax on O20 8944 2253 or by letter toThe Reservations Manages; Facilities Management Catering Ltd., Church Road,Wimbledon, London SW l 9 SAE. Please remember to include aii your contact details, the date you would like a reservation and for how many people. Cheques should be made payable to Compass Services UK Ltd. Confirmation of non-internet reservations will be sent out during the third week of May. Reservations may not be made by telephone but if you have a query on a confirmed booking, you may telephone 020 S24? liu;} from Monday 26th April. The dress code is smart casual (no jeans) and we are unable to accept msemuens for s The restaurant opens at l l.l Sam and we do not allocate individual seating arrangements prior m your awmio yi _ y T r .ve i ~ E _y°. s ; .i i C; _ liiv y T s Official caterers to The Championships, Wimbledon C s yl _ if yyes v it Qi _‘ .. e yipyr y lsly g "
This is pretty much best case conditions and I think it does a really good job.
I did also try a hand written test But it only manged to detect "over #06 lazy" Which I guess considering my handwriting is pretty good too :-)
Sadly I lack the time to turn all this into a consumable download for others. Which brings me to a realisation that some people missunderstand what is meant when the n900 is referred to as a great developer phone. I've seen people complaining that there aren't great apps available so it's not great at all. But the point is that this is a fantastic phone for *me* and others like me. The fact that I could just grab some open source software and put this together is awesome. But I don't have time to wrap it up in a nice package for other people, and I have no great motivation to do so. Anyone prepared to invest a little time and effort can do amazing things with this device, but if you are just sitting and waiting for someone else to put in that effort and give you something on a plate, you just might be waiting for some time.