...well there's more to them than meets the eye.
The most common variety is 'captcha' (as in capture). They have two words and look like this:
The puzzle checks to see that you are a 'real' person before allowing you to access the site. People are quite good at dechiphering them, computers aren't. And there are other people 'out there' who write programmes which log into millions of sites automatically to steal email addresss, leave links, sell Viagra, break into your bank account, steal your children and generally make life difficult. But so far computers generally find this sort of puzzle too difficult and cannot enter the sites protected by these garbled guardians.
In an intreguing bit of double-think Google has come up with a practical use for this process.
You may be aware that Google are in the process of digitising a shed load of printed material including millions of books, magazines and newspapers.
All the books in the world in Google's shed... |
Books damaged by a deluge of 1988 Chardonnay following an accident in the basement of a restaurantnext to a rare book dealer in the Charing Cross Road |
Every time we 'solve' one of these puzzles our answer is sent back to Google who use it to suggest a meaning for a word that their character recognition programme could not understand.
This is a nifty implementation of distributed computing (where a massive task is divided up between a load of people who all do a little bit each) but it does give rise an interesting scenario and one big question.
The scenario concerns what happens if the answers we supply are fed back into the character recognition programme so that it learns to decipher garbled words? This sort of feedback loop must be an irresistable temptation to the programmers. It is elegant in the extreme and will help speed up the process - but if the programme escapes and the bad guys will get hold of it they will destroy the 'is this a real person' test.
I expect the answer is 'We wouldn't do that' or alternatively 'It won't happen'. Hmmm, we shall see...
The big and more obvious question is: If they do not know what the word is, how do they know we have typed in the correct letters?
Of course they have thought of this. As noted above, there are always two words. The reason being that one word is known and the other is an unknown word from the scanning project. They work on the basis that if you get the known one right, the unknown one will be either right or nearly right. They will collect a load of suggestions for the unknown word and (probably) choose the most popular. An exercise in democracy in translation (discuss).
But as far as the 'is this a real person' test is concerned, only the known word counts. So next time you are having trouble with one of the words it is 50/50 at worst that you need to get it right at all. If fact the easier it is to read the more likely it is to be the one that counts. If it is a number, has accents or punctuation or is truly indecpipherable there is a good chance that you can type in just about anything and still pass the test.
Among the issues yet to be addressed is: Do they try to prevent rude and offensive words appearing? If so how?
No comments:
Post a Comment