Engaging with Computer Vision

Creating digital experiences that respond to the movements of people’s bodies, faces and hands can be a compelling way to engage your customers.

Computer Vision

Since my work revolves around leveraging technology to engage people and tell compelling stories, I read a lot about technology. I write about technology sometimes. And I often get asked about technology. Those conversation tend to be about headline-grabbing tech like AR and VR. But there’s another interesting set of technologies that has advanced greatly in recent years, and that has many applications for customer engagement, but that gets far less attention – computer vision.

Computer vision is a broad topic, including various ways that computers gain understanding from digital images or videos. In this way it aspires to stand in for all that humans can do with sight, which obviously covers a lot of different capabilities. It has widespread military and industrial applications such as missile guidance and optical sorting. It’s crucial to ongoing development of autonomous vehicles, which have the potential to revolutionize the way we live and reduce the number of car crashes. It can help keep drivers safer via applications like drowsiness detection. And medical applications like reading CT scans to detect lung cancer also promise to save lives.

But the underlying technology also has many uses that are relevant for brands and marketers. People interact with computers all the time (including the fairly powerful smartphone computers that we carry around with us.) We usually interact with these systems via keyboards or touchscreens. But adding some type of computer vision into this interaction model opens up some interesting possibilities, allowing new ways for systems to respond to their users. Of course nobody approaches engagement design this way – we need to think about what our strategic objectives are first, and then identify tactics that achieve those ends. But I think it’s still informative to look at these related capabilities together.

Relevant Categories

I’ll categorize these into subsets by looking at three specific ways computers can evaluate and react to a person – body tracking, face tracking and hand tracking. These are of course related, and the underlying technologies are largely the same – capturing video of the user and processing it to extract information. Sometimes special cameras or additional types of sensors like Lidar are you to improve the result, but we can still consider this the “visual processing” that underlies computer vision techniques. Improvements in AI have increased the pace of improvement for these techniques as well.

One relevant and timely side effect of these computer vision applications is that in many of their incarnations they allow for hands-free interaction. That’s helpful for creating interactive experiences in pandemic times. We’re likely on the upswing out of the depths of required Covid precautions now. And from what we know air quality is likely a more important concern than surface contaminants. But we’ll still likely need to accommodate for consumer reluctance for a period of time. Last year 77% of consumers said they expected to avoid physical interactions going forward and increase their use of touchless technology. At least some proportion of those people will likely continue to feel uneasy about things like public touchscreens.

Body Tracking

Body tracking has been used by artists and marketers for a long time now. Thanks to some increasingly sophisticated sensors it has gotten both more capable and easier to implement. A decade ago at CRI, we created an interactive billboard in Times Square for Forever 21 that superimposed “giant” models onto a camera feed of the assembled crowd. The model would then pick up and carry off someone from the crowd. This was enormously hard to accomplish at the time. These days we could create a similar effect with only a smartphone.

A significant driver of body tracking into the mainstream was the release of the PC version of Microsoft’s Kinect in 2012. The Kinect is purpose-built to detect the human form and provides skeletal tracking. It did much of the heavy lifting of body tracking, allowing creators to concentrate on ways to use that information. It’s been an invaluable tool for digital art and experiential marketing, and I’ve created many experiences over the years that utilized it.

Body tracking is well suited to artistic expressions, as seen in everything from Rolls Royce’s Spirit of Ecstasy installation to IBM’s AI Highlights interactive at the US Open. It can allow an interactive engagement to change participants into different characters such as in NIssan’s Titan and Rogue digital mirrors. It can be used to create a responsive experience that encourages deeper interaction from passersby, as in this Innovation Wall. And it can be used as an input mechanism for gamified experiences like this Badge Catchers game at Dreamforce.

Face tracking

Camera vision can also be used to track specific parts of the body, such as faces. Using face tracking, interactive systems can locate faces in video input in order to respond to them in some way. This makes it distinct from facial recognition, which attempts to pinpoint the identity of a specific person within a photo or video feed. The later is much more difficult and error-prone, and is a privacy concern for consumers, both of which make it ethically dubious, especially for marketing purposes.

Face tracking, on the other hand, is ethically neutral and can allow for some fun interaction models. Getting people to move their faces (especially if they smile) can create a far more memorable experience than asking them to simply stare at something immobile. Magnum Streets interactive kiosks could only be engaged when a user smiled, and the interaction was completely driven by facial expressions.

An even more extreme version of this approach was taken by Nike with their Free Face interactive in Japan. It allowed users to bend and twist their facial expressions to control a digital Nike Free shoe. The more they contorted their faces, the more the shoe moved.

Similar techniques were used for a more serious purpose in Facebook’s Mute the Mouth game, which used facial gesture recognition technology to detect and measure teen expressions for different types of “bad influences,” allowing them to shut down bad ideas with their facial movements.

Face tracking can also be used to create a parallax effect by changing the scene in response to a user’s head movements and keeping virtual elements in a scene aligned to their point of view. For a physical installation that only works for one person at a time, but we’ve used it successfully for small-scale activations like this configurator.

It can be even more effectively applied to smartphone-based experiences. In fact, people are most familiar with the use of face tracking on their phones. It’s used on social media all the time to create fun photo and video filters, superimposing specific imagery or applying some type of distortion to detected faces.

Hand tracking

Hands are another part of the body that we’ve worked hard to develop good tracking for. As a species, the fact that we have opposable thumbs has been critical to our development and probably underlies all our technological progress. We’re accustomed to using our hands to accomplish tasks and to carry out our intent. So if we are to communicate with our computers we are likely to do it with our hands. We most often use touchscreens for that now, and before that we used specialized input devices like mice and keyboards.

But we’ve long dreamed of interacting with our computer systems in simpler and more intuitive ways. Using computer vision to track our hand movements is one way to accomplish this. And one needn’t change the entire paradigm of human-computer interaction to gestural interfaces in order to implement it.

The ability to respond to simple hand gestures in a digital experience can be as simple as incorporating a consumer peripheral like a Leap Motion controller. These sensors use specialized IR cameras to identify hands and map their movement. We’ve used them to allow users to “drive” through a storyline in regular kiosks like IBM’s Cognitive Mobility installation as well as for immersive VR experiences like Mazda’s Jinba Itai experience.

In fact, the need for good hand tracking when using VR headsets is driving forward lots of research into the area. Ultraleap recently previewed their newer Gemini tracking software, which promises improved tracking over previous generations. And Facebook Research has published some work showing incredibly detailed hand tracking.

Conclusion

As I’ve said, we can’t design experiences by starting from the technology – that’s an answer looking for a question. But it is important to understand what capabilities do exist in order to take advantage of every opportunity as we do strategize over the best ways to engage people. And using the human body to interact with systems in new and interesting ways can be one way to break through. And implementing in-person touchless experiences can be a way to get customers up and waving, dancing, or simply marveling at what we’re able to put before them.

Featured Image: Nissan Innovation Wall