Dr. Cees Snoek: What Objects Tell Us About Actions

Photo by Anne Helmond.

Here at Blippar we are very interested in objects. Objects are all around us, and we can see hundreds, if not thousands of them in front of us at any time. Objects provide context to what's happening around us, so what can objects tell us about actions?

This is the principal question asked by Dr. Cees Snoek, director of QUVA, the joint research lab of the University of Amsterdam and Qualcomm on deep learning and computer vision. Recently, Dr. Snoek came by the Blippar London office to share some of his research with our team. Here are some of the highlights.

Computer vision is learning like humans
As humans learn to speak, we typically pick up more nouns, or objects, first. We first learn to identify the objects we interact with, before learning how to describe what it is we do with these objects.

Cees Snoek Computer Vision

The same can be said for how computers learn. By first teaching computers to recognize objects, they will be better suited to recognize actions. Right now the Blippar app can recognize the objects near you, like a ball and a bat. But it can't recognize "swinging" the bat to hit the ball, for example. At least not yet.

"Blippar leverages computer vision and deep learning to bring visual search in the hands of millions of users. A very exciting prospect," says Dr. Snoek.

Dr. Snoek has begun to teach computers how to recognize and identify actions by determining which actions are associated with which objects. This helps narrow down a list of potential actions.

Video recognition is catching up to images
The state-of-the-art of image recognition is very strong. So strong, in fact, that computers now make fewer mistakes recognizing objects from images than humans do.

But when it comes to recognizing actions from video, humans are still much more accurate than computers. With some of the research Dr. Snoek is doing, that gap is beginning to close. It's only inevitable that one day computers will also surpass humans in action recognition.

Objects aren't always useful for actions
Recognizing objects, when they are present, is important when attempting to recognize actions. For example, recognizing a baseball bat or a violin can go a long way to narrowing down the potential actions at play.

However, not all actions include objects, and thus are a bit more difficult to narrow down in this way. For example, doing pushups, running or jumping-jacks are actions done without objects, but they do have distinct motions which computers can learn to recognize on their own.

Cees Snoek Computer Vision Emjois

Further illustrating the power of object recognition and the state-of-the-art, Dr. Snoek showed off a research tool he helped develop called Image2Emoji (shown above). As the name suggests, it can return a set of emoji pictograms that reflect the content of a given image, or even a video!

To learn more about Dr. Snoek's research and how Blippar is bringing object recognition to the world through the Blippar app, you can see him alongside Blippar's President of Marketing Omaid Hiwaizi at Cannes Lions 2016 this summer. Dr. Snoek will be speaking about "From Faces to Objects: Computer Vision" at 14:30 on Wednesday, June 22nd.