Why the 'Minority Report' Interface is Far from Practical

By Will Greenwald

You can't trust a movie to predict the future of computing.

It's hard to believe that Minority Report was released 8 years ago. The Steven Spielberg film, while being a perfectly serviceable science fiction thriller and adaptation of Phillip K. Dick's work, is still most remembered in the technology community for introducing gesture-based computing to the masses. The one scene where Tom Cruise's character controls a computer and manipulates video playback by waving his hands in front of a screen stands out as an eye-opening moment in cinematic-technology history. It's far and away the most referenced film scene in modern tech journalism--bloggers can't seem to help but allude to it whenever any new touch or gesture-based interface innovation debuts.

Oblong Industries. And over the past decade, Underkoffler has been committed to bringing gesture interfaces--which he calls g-speak--to consumers as a new standard for computer interaction.
presentation about g-speak and the future of the user interface at the TED2010 conference this past February. He demonstrated gesture-based interfaces and discussed its advantages and potential for growth. He showed off the non-movie version of g-speak to a packed auditorium, complete with image browsing and video editing.
G-speak is almost exactly as it's portrayed in Minority Report. The user wears gloves that a computer (in this case, using a camera) tracks in 3D space. Using various hand gestures, the user can move files and objects between different displays, opening and closing aspects of them as needed. In his demonstration, Underkoffler pulled the image of a girl out of a video clip and moved it onto a secondary, table-based display, and "steered" the perspective of the main display through a 3D space of cascading photographs.  

While they seem functional on stage, hand gestures aren't quite as easy to capture at a desk. If you have a webcam, consider how much of your body is on display when you sit comfortably at your desk, and how much you have to move to get your hands consistently into frame. For hand gesture interfaces, the device that records the gestures has to reliably and consistently monitor the user's hands at comfortable positions. Holding up "jazz hands" for minutes at a time just to browse the web is not an improvement to the computer interface.  
The leads to a more important ergonomic issue: arm motion. The mouse and keyboard interface works well because it requires very little arm or wrist movement. The user can cover the entire display and enter a full range of text commands by only moving a few inches. Compare that interface design with g-speak both in Minority Report and in Underkoffler's demonstration; while waving arms like you're guiding planes on an aircraft carrier might feel a bit more natural than tapping away with your fingers, it's not the sort of activity most office workers can sustain for hours at a time. It's a great interface for a 15-minute presentation, but for an 8-hour stint in a cubicle it's a recipe for cramps and soreness. 
Finally, gesture-based interfaces, and other so-called natural user interfaces, may not be as intuitive as researchers believe. We've written about this topic before, and you can read our analysis here. The upshot is that while these kinds of interfaces may be relatively more intuitive than a keyboard or remote control, there is no universal or innate standard for computing gestures--users are still going to have to learn a whole new lexicon of actions.
G-speak isn't the only gesture-based interface in development. Toshiba recently showed off its "AirSwing" interface, a gesture-based system for electronic advertising.  While it's designed more for digital billboards than computers, it gives the viewer the same sort of gesture control as g-speak. And from the video demo of the interface (below), you can tell it looks clumsy and slow in its current implementation.  
Microsoft and Sony are also working on their own gesture and motion-based control systems for the Xbox 360 and Playstation 3. Microsoft's Project Natal uses a multi-camera accessory to track user motion, translating it into input for video games. Playstation Move uses a glowing wand controller tracked by a camera to offer gesture-based controls similar to the Nintendo Wii.   
Advertising and gaming-oriented gesture controls potentially overcome many of the issues computer-oriented gesture-based interfaces face. In larger, more casual environments, gestures are much easier to make, and since billboards and gaming systems generally use much larger displays than workstations, gestures are more easily translated to scale. It's one of the reasons the Wii, with its remote-based gesture controls, has become so popular. It's easier to wave your arms from the couch or the middle of your living room than it is at a desk (though Wii fatigue is not unheard of). 
Underkoffler predicts that every computer will have g-speak or a similar gesture-based interface in the next five years, but that seems overly optimistic. Gesture-based controls have been in development for over a decade, and they've yet to replace the mouse and keyboard as the main input method for computer systems. Even if Underkoffler can make g-speak affordable and accessible, he can't change the fact that his system requires more room and movement than the current user interfaces available. That said, while it will take much more effort to make gesture interfaces popular in computers, it may pick up steam in the home entertainment sector, taking advantage of ever-growing HDTVs and the generous room most home theaters offer to fully embrace the technology. 
We're cautiously optimistic about the promise of gesture-based interfaces, but it's yet to prove its worth just yet. The future predicted by Minority Report is on the right track, but its gesture-based vision is hardly the holy grail of computer interfaces, and shouldn't be treated as such. 
Image credits: Dreamworks, TED2010