This is the first of a two-part blog post that will provide a glimpse inside a research lab at Xerox where scientists and imaging specialists are developing the latest imaging technology. You are invited to play with technology currently being developed and provide valuable feedback directly to the research team.
Guest blogger – Craig Saunders (Craig manages computer vision research at Xerox’s European research lab in Grenoble, France. Craig’s primary research interests lie in Machine Learning and the development and application of algorithms to a wide variety of application domains including digital photos. He has a PhD in Machine Learning).
The first area of our exploration is image classification and retrieval.
If you are like me – you take a lot of photos and have large collections of images. While you may be super organized and store them in sets of folders (perhaps by date, or theme, or category) it is unlikely you have tagged all of your images with all possible keywords. So, as Murphy’s law dictates, when you are looking for that great picture of a giraffe you took, you might not have tagged it as such, or remember where you took it. This is the time you could use an image categorizer. You would simply type in giraffe and it would go through all of your photos and give you back the ones that contain a giraffe. Wouldn’t that be easy?
What about image retrieval? Well, it’s the same underlying technique but to do something very different. Perhaps you have a recent shot you took of a sunset and you want to see all of the similar shots you have taken in the past (to compare contrast, check camera settings etc). This would be tricky even with a well-organized collection in folders and a decent set of tags. But imagine how easy it would be if you could simply do the query using an image and get similar ones back instantly.
There are teams of researchers around the globe employing advanced algorithms and mathematics to help solve these problems. Let me introduce you to a few technologies we are currently working on and hopefully I can demystify how they work.
You are welcome to check out the imaging demos in our virtual lab – Open Xerox.
Image Classification and Image Retrieval
Image classification tries to determine the content of an image (e.g. is the photo a picture of a car, a horse, a beach, and so on). There are many different ways to tackle this problem – but most follow a two step process: (1) find a good way to represent an image (which is called an image signature) and (2) find a good mathematical model to use to predict the category the image belongs to.
As you know, images are represented by pixels, which typically give a value for the color at each point in the image. This is the way images are stored on your computer.
If we want the computer to automatically determine a simple categorisation question such as ‘is it a car?’ or ‘is it a horse?’ – you need to store the image in such a way that will enable the computer to answer these questions. For example, a really simple representation would be to count the number of different colors in an image (e.g. draw a histogram which plots the number of blue pixels, red pixels, green pixels, and so on, in the image). You could then use this information (i.e. there are more red pixels with some black than there are brown pixels in the image) to determine that the picture is more likely a car on a road than a horse in a field.
There are a number of methods used to represent and thus classify images including a widespread technique called ‘the bag of visual words’. Small elements or patches of an image are extracted and clustered together to form ‘words’. These clusters are often hard to interpret, but you can imagine that if say you had a cluster for wheels, one for faces, one for windows and so on, then representing an image in this way would be useful to determine its category.
Image Categorizer, available for a test drive on open.xerox.com, automatically classifies images according to their content without the use of tagging.
Once we have several million images that have been labelled or tagged with ‘horse’, ‘car’, ‘mushroom’ for example – we can build mathematical model s which learn from the image data and can more accurately predict if a new image added to the system is indeed a car or a horse. Naturally, the more categories you have, the harder the problem is. For example, consider separating between 100 different specific types of mushroom from only a photo – this is a task that would be hard even for a human.
Try it out!
Visit our virtual lab to see some of these technologies in our current demos. The first ‘similar image search’ covers mainly the retrieval technology. There is a collection of several million images, and you can either search the collection or upload your own image and see if there is anything similar in the database. If you decide to log in for the advanced features, you can even create your own database and use the demo on your own image collection. This demo also contains a categorizer with a huge number of categories. The second demo on the web page contains a custom-built categorizer. It has a smaller number of concepts (700 – which is still not trivial!), but is perhaps more accurate and can also be used on your own images.
Please provide us your feedback once you’ve tried the e demos.! Stay tuned for my part two blog post where I’ll describe a technology that is in alpha and very experimental. It promises to help you not only find an image but determine how good it is!
(P.S. You can check out this post on these and other imaging tools available to test drive from Xerox.)