NOTE: This is the second of a two-part blog post that provides a glimpse inside a research lab at Xerox where scientists and imaging specialists are developing the latest technology. You are invited to play with technology currently being developed and provide valuable feedback directly to the research team. The first post can be found here.
Guest blogger Craig Saunders manages computer vision research at Xerox’s European research lab in Grenoble, France. Craig’s primary research interests lie in Machine Learning and the development and application of algorithms to a wide variety of application domains including digital photos. He has a PhD in Machine Learning
In the last blog entry I covered a little bit the technologies around image classification and retrieval. These are very mature technologies, being investigated my many research groups. In this post we’ll discuss a newer line of emerging research: Aesthetic Image Assessment.
As we discussed before, categorization technology is mature and very good image representations and mathematical models have been found to tackle the problem (if you extract an image patch that looks like a wheel, then the image is more likely to contain a car then a horse for example). But what about aesthetic judgment? What makes an image good or bad, pretty or ugly? These are hard questions to answer because they are subjective and depend on the content (a city-scene shot might require pin-sharp focus and great depth of field, whereas a portrait or macro shot might want the subject in focus and the background blurred for instance).
So how do you go about creating an algorithm that can even attempt this problem? Essentially the two key elements are the same as in the categorization case: first you need to find a good representation for the images, and secondly you need to learn a mathematical model from data. You can use the same representation as we did before to find ‘visual words’ or ‘parts of things’ – but content only tells some of the story. So you might also consider using other features of the image such as exposure, amount of blur, color and so on. Once you have a combination of these types of attributes you can apply similar mathematical techniques to the ones used for categorization, only this time instead of taking examples of ‘car’, ‘horse’, ‘beach’ to train our algorithm, we use examples where people have personally judged a photo as ‘good’ or ‘bad’ or have given some kind of score.
Is it even possible?
Note that the aesthetic classification is a much harder problem to solve than categorization. Also one can think of many different ways to tackle it (Do you try and learn good/bad separately for each category? Do you learn a general model? What happens to photos where human judgments do not agree?). There are many questions, and it is unlikely to be a ‘solved problem’ any time soon, even if that’s possible. But imagine how easy it would be to select the best photo if some software existed that would help you remove ‘obviously bad’ images and select the best one for the task at hand.
Take a look
Try out our first attempt at doing this by checking out the aesthetic image search demo on Open Xerox. While you can’t upload your own images yet – this demo will give you a taste of what the technology can do. Using a set of images in the demo collection the technology first tries to classify the content (beach, flower, boat, etc.) and then tries to rank the images into good/bad. It’s not perfect but it provides a glimpse of where our research is headed.