Friday, May 25, 2007

Fill Out CAPTCHAs, Digitize Books At The Same Time

For the uninitiated CAPTCHA is user input verification technology, an acronymn for
" Completely Automated Public Turing test " to tell "Computers and Humans Apart ", designed by the Carnegie Mellon university. Now what this complex test does is really quite simple. The task of the computer is to create a verification test, for which it knows the answer[ hence can verify], but cannot solve it on it's own. So,because a computer cannot solve the test, any correct response automatically
qualifies the client to be human[ in the physical sense, not ethical, moral and all that ].
One of the earliest uses that I remember was verification of genuineness of users while creating e-mail accounts. People using Yahoo might recollect.This was to stop a sudden rush of bots creating fake email accounts, for spammingetc. There was a small "thank you" to CMU also.
http://upload.wikimedia.org/wikipedia/commons/6/69/Captcha.jpg
An example of a text image, used as a CAPTCHA.
These have always been an accessory to the current password verification schemes.

Now for the central issue of this article. A recent post in Network world involved a awesome idea from a tech-writer called AlphaDoggs. The article can be read here.
It discusses the idea of using CAPTCHA's as a tool to digitize the vast sources of information that lies buried in our books. Sure , Google Books is doing a wonderful job.[ Recently they announced that they'd be scanning 800,000 Mysore university books for free( all gimmicks aside, this still is wonderful) ].
Before I wander off again [ Damn you Google, why are you so GooD?],
this project is the brain- child of a professor from Carnegie Mellon University [ this is kinda repetitive, monopoly I guess] . What he plans is on the lines of the SetiProject, Wikipedia etc..
Now, people got to fill captchas right.? Now instead of filling random computer generated data, why not give images from scanned OCR docs, that couldn't be recognized easily by the OCR algorithm. Luis von Ahn, is the man behind this. Hat's off, Ol' Dude.
In his own words," Instead of requiring visitors to retype random numbers and letters, they would retype text that otherwise is difficult for the optical character recognition systems to decipher when being used to digitize books and other printed materials. The translated text would then go toward the digitization of the printed material on behalf of the Internet Archive project " . Now this is really interesting,
I might be actually involved in a project translating Milton, Shakespeare, or even Gandhi [ji] for that matter.In my own small way,[ two words at a time], I'm part of a group , a community that devotes nearly
150,000 man-hours a day [ 60 million captcha's per day, you do the math], for digitizing age-old archives.If it'll not create new content, atleast it can be used to verify existing OCR'd stuff, and correct their mistakes.
This project is called Re-captcha[ how creative?] . You can be a part of it. Some of the immediate applications, even if you are not Yahoo, Google [ not again] , Intel etc, would be...

1. Email Address hiding. Prevents automatic web-crawlers from scanning email-id's. Sure some of you wizards might type dumbfool[at]dumbdomain[dot]com, but seriously ,
man that's two strcmp's away from decoding.
An example would be this.
Here's my email address, seemingly hidden in plain-sight. Click on the link and see what happens.
abhi...@gmail.com
[ If you've actually managed to read this far, please drop me a mail, after decoding my address]
2. Web-address spoofing [ not in the phishing sense] , just an innovative alias,
to keep bots and pesky servers away
3. Any means of verification, and some fun too [ I might get a chuckle from fellow cryptographic, turing machine enthusiasts here]

By the way, all you wordpress people, there are plugins already, Grab them.
There is also an audio version coming up for blind people.[ So if you are blind, and you are not reading this , then you'll have to wait]

Queries , can be asked at the email address mentioned above. I know , some of you might have an interesting obvious doubt. I choose to answer people who have Re-CAptcha'd my test.

P.S [ The 'B' will be leavin our college next month :( ] :-
Re-captcha is free, so you might want to put it up on your site.I don't have a friggin' clue what you'll verify, but just for kicks, try it.......

Comments:-

Saturday, May 05, 2007

Poetry in (loose) Motion

Life@NITK apart from other things, has added a new weapon in my armour, a new arrow in my quiver so as to speak.I'm talking about poetry of-course.
My alpha-testing of extempore humour poetry initially started off as a cheap way to mimic the awesomely talented "Whose Line is it Anyway " guys, and their superb Hoe-downs.
But then, owing to our esteemed faculty, especially the non-department ones, I've had a chance to hone my skills. With that faltoo phone in my hand,right in the middle of the class ( ok front bench-to be specific),I've succeded in creating limericks ad-lib.And guess what, most of them were actually good. A few samples some time later.
But now to the essence of this post, off-late I've been trying my hand at other things( ok folks , don't get perverted ideas now). I'm referring to a more serious variety of verse.What peeeple call "Poetry".
With two poems already, I'm growing strong. With esteemed critics supporting me
{ Takaal, Vk, and most importantly an experienced poet by the name of Rajaram Ramachandran[ this chap is 76 years old] } I'm hoping that creative juices will continue to flow , and create master-pieces, that generations will cherish for-ever. Ok, lets not get hopes too high.
Look out William Wordsworth. Here I come.
Let's see if my Words are Worthy enough. [ Pun definitely intended ]

P.S:- Check out my poems at Poem-Hunter
http://www.poemhunter.com/abhishek-upadhya/
Any comments, honours, awards,
(AND-OR) insults, "constructive-criticisms", kick-in-the-groin are most welcome.

Comments:-