Google’s machine learning based image software has come a long way but nature still has it baffled
Google has just released the latest version of its open source image recognition software based on machine learning algorithms that the company first started developing back in 2014 and it’s now 93.9 percent accurate. this is a giant leap forwards in capability for the software which for the past two years has been learning how to classify and caption a vast array of different images – and when you realise that every year people search through trillions of images using Google’s image search that’s a big deal.
In fact the new algorithms are getting so good that they’ve spoilt all of the examples I was going to give you – as somewhat of an image and photography enthusiast I’m always hunting for images of something or other and in the past when I’d say search for “Amazing landscapes” while Google would show me some great photos I’d invariably have to sift through hundreds of junk images of trees, goats, pop groups and a whole bunch of other things. Taylor Swift is not an amazing landscape – although I’m sure some of you will disagree.
With a 93.9 percent accuracy Google’s system is now so good that it’s beginning to rival human image recognition capabilities. Search for “Rocky outcrop” for example and except for a couple of images of jewellery and a bright orange Guianan Cock of the Rock bird of paradise and you’ll see that almost every image hits the mark. However, if you actually then search for “Guianan Cock of the Rock” images then you can still tell that Google’s new software has a real problem trying to figure out wildlife.
Now that the system is getting so good the team behind this newest iteration, which uses something called an Inception architecture, are wondering whether their algorithm can go one step better and not just describe what’s in an image but also interpret what’s in the image. For example, the system today might return a description such as “Dog on a beach” but our human intellect will be able to tell that the real description should be “Dog running on a beach towards the sea”. This couldn’t just be a “regurgitation” of data, Google’s developers say. The algorithm had to be able to naturally develop an understanding of the objects in the image and their uses.
“Excitingly our model does indeed develop the ability to generate accurate new captions when presented with completely new scenes, indicating a deeper understanding of the objects and context in the images,” said the team, “and just as important it learns how to express that knowledge in natural-sounding English phrases despite receiving no additional language training other than reading the human captions.”
Machine learning algorithms are proving to have a greater understanding, at least at the moment, of still images and video won’t be far behind and once these technologies become more mature, which shouldn’t be too long now we’ll not only be able to search images or videos accurately but it will have titanic consequences for machine vision applications.