Quantcast

Google Applies Machine Learning to Personal Photo Search

By Wesley Fenlon

Forget tagging photos--let Google's computer vision algorithms apply their own tags as they scan through your photos.

Google's image search is smart, but it's not smart. When you search for an image, Google looks at a lot of different data to bring back results--the filename of the image, its age and filetype, the text of the website where it's hosted. Searching personal photos, Google explains, is far more difficult, since there's rarely a useful filename or other identifying data. That's why Google is especially pleased with its new photo search technology, which it just rolled out for Google Plus at photos.google.com.

"This is powered by computer vision and machine learning technology, which uses the visual content of an image to generate searchable tags for photos combined with other sources like text tags and EXIF metadata to enable search across thousands of concepts like a flower, food, car, jet ski, or turtle," writes the Google Research blog. That sounds smart smart. And impressively, Google built the thing in six months.

As explained by the blog, computers, even highly advanced ones running complex algorithms, have trouble identifying what's going on in a photograph. It says toddlers are better at photo recognition. But a new technique demonstrated last October at a computer vision competition caught their eye. Google prototyped a similar system and found it twice as successful as anything they'd tried before, so they bought the startup's technology and turned it into a smart image recognition system within six months.

The rest of the post contains some interesting insight into how Google trained up its photo recognition system, which includes 2000 visual classes it can identify. Each class was trained with 5000 sample images, which is why a class like "car" can identify both exterior and interior shots. More impressive is when the system can identify more generic photographs, like picking out some kind of food to apply the "meal" class.

Even the image recognition system's mistakes are pretty smart--when it messes up, it makes mistakes a human might make, like thinking a banana slug is a snake. Those things are weird.