Friday, May 25, 2007

Fill Out CAPTCHAs, Digitize Books At The Same Time

For the uninitiated CAPTCHA is user input verification technology, an acronymn for
" Completely Automated Public Turing test " to tell "Computers and Humans Apart ", designed by the Carnegie Mellon university. Now what this complex test does is really quite simple. The task of the computer is to create a verification test, for which it knows the answer[ hence can verify], but cannot solve it on it's own. So,because a computer cannot solve the test, any correct response automatically
qualifies the client to be human[ in the physical sense, not ethical, moral and all that ].
One of the earliest uses that I remember was verification of genuineness of users while creating e-mail accounts. People using Yahoo might recollect.This was to stop a sudden rush of bots creating fake email accounts, for spammingetc. There was a small "thank you" to CMU also.
An example of a text image, used as a CAPTCHA.
These have always been an accessory to the current password verification schemes.

Now for the central issue of this article. A recent post in Network world involved a awesome idea from a tech-writer called AlphaDoggs. The article can be read here.
It discusses the idea of using CAPTCHA's as a tool to digitize the vast sources of information that lies buried in our books. Sure , Google Books is doing a wonderful job.[ Recently they announced that they'd be scanning 800,000 Mysore university books for free( all gimmicks aside, this still is wonderful) ].
Before I wander off again [ Damn you Google, why are you so GooD?],
this project is the brain- child of a professor from Carnegie Mellon University [ this is kinda repetitive, monopoly I guess] . What he plans is on the lines of the SetiProject, Wikipedia etc..
Now, people got to fill captchas right.? Now instead of filling random computer generated data, why not give images from scanned OCR docs, that couldn't be recognized easily by the OCR algorithm. Luis von Ahn, is the man behind this. Hat's off, Ol' Dude.
In his own words," Instead of requiring visitors to retype random numbers and letters, they would retype text that otherwise is difficult for the optical character recognition systems to decipher when being used to digitize books and other printed materials. The translated text would then go toward the digitization of the printed material on behalf of the Internet Archive project " . Now this is really interesting,
I might be actually involved in a project translating Milton, Shakespeare, or even Gandhi [ji] for that matter.In my own small way,[ two words at a time], I'm part of a group , a community that devotes nearly
150,000 man-hours a day [ 60 million captcha's per day, you do the math], for digitizing age-old archives.If it'll not create new content, atleast it can be used to verify existing OCR'd stuff, and correct their mistakes.
This project is called Re-captcha[ how creative?] . You can be a part of it. Some of the immediate applications, even if you are not Yahoo, Google [ not again] , Intel etc, would be...

1. Email Address hiding. Prevents automatic web-crawlers from scanning email-id's. Sure some of you wizards might type dumbfool[at]dumbdomain[dot]com, but seriously ,
man that's two strcmp's away from decoding.
An example would be this.
Here's my email address, seemingly hidden in plain-sight. Click on the link and see what happens.
[ If you've actually managed to read this far, please drop me a mail, after decoding my address]
2. Web-address spoofing [ not in the phishing sense] , just an innovative alias,
to keep bots and pesky servers away
3. Any means of verification, and some fun too [ I might get a chuckle from fellow cryptographic, turing machine enthusiasts here]

By the way, all you wordpress people, there are plugins already, Grab them.
There is also an audio version coming up for blind people.[ So if you are blind, and you are not reading this , then you'll have to wait]

Queries , can be asked at the email address mentioned above. I know , some of you might have an interesting obvious doubt. I choose to answer people who have Re-CAptcha'd my test.

P.S [ The 'B' will be leavin our college next month :( ] :-
Re-captcha is free, so you might want to put it up on your site.I don't have a friggin' clue what you'll verify, but just for kicks, try it.......


1 comment:

  1. A very interesting one dude.. gr8 work!