How GesturePak Works


Microsoft has lowered the bar with Kinect for Windows so that anyone willing and able to spend $250 can interact with their PC using their bodies. The Kinect for Windows API gives the programmer a stream of raw data coming from the sensor, but making sense of that data is a challenge.

GesturePak uses the skeleton recognition system in the Kinect for Windows SDK to help the user turn complex gestures into data, which can then be matched against real-time movement. Gestures are broken down into a series of poses. Pose matching happens first. A gesture is considered recognized when the poses in that defined gesture are matched in series, within the allowed amount of time.


GesturePak is two things. The GesturePak Recorder is a Windows app that lets you record, edit, and otherwise define gestures. The GesturePak API lets a .NET developer recognize those gestures with no real programming.

A gesture has a few constraints. You can select which axes to track, which joints to track, how much margin of error to allow, and how much time is required and allowed in between each pose. With these simple constraints, an unlimited number of simple and complex poses can be defined and recognized.

How does it work?

Gestures are defined and created with the GesturePak Recorder app. The user breaks down a gesture into a series of poses, each pose representing a body position. A stick figure of the user is shown in real-time. As well, each pose is represented as a stick figure.

Poses can be added easily with the voice command "Snapshot," or with a quick rocking back and forth of the head.

Poses can be deleted and inserted in the series that makes up a gesture. The user can also watch an animation that shows the stick figure going through the poses.

Each axis (X,Y,Z) can be optionally tracked. Each of the 20 body joints can optionally be tracked as well. All of this is saved as data as an XML gesture file.

Gestures can be recognized in any .NET application with the use of the GesturePak Matcher, part of the GesturePak API, which is easily added to any project as an assembly reference.

The GesturePak Matcher performs an analysis on each frame in real-time. It walks through each pose in each gesture, checking to see if the user is matching any of the poses. A certain definable amount of slop is taken into account when matching poses. More accuracy means less slop allowed. If a pose is matched, the application is notified via a callback.

Optionally, the programmer can opt to poll the Matcher to see if a gesture or pose was recognized. That way, GesturePak stays completely out of the way of the existing code.

Once pose matching is established, the Matcher checks each gesture to see if the poses were matched in the correct order in the required amount of time. When a gesture is matched, the application is notified via a callback, or GesturePak can be polled as stated earlier.

Multiple gestures can be loaded at the same time, and GesturePak discerns each gesture individually. Gestures can be sequenced. For example: gesture A could be movement of the right hand from left to right, and then gesture B could be movement of the right hand from right to left. Since your code is alerted when each of these gestures is performed, your application could then determine that both of them were performed one after the other within a given amount of time, and that sequence could be considered a macro-gesture.


Gesture-controlled applications are most practical in a situation where the user is not physically able to sit down and operate a computer. Here are a few scenarios:

  • Industrial Control

  • In an industrial scenario, often a machine operator can't move away from their post in order to operate a computer. However, they could turn to the side, extend a hand and make movements to set tolerances, start or stop processes, or anything else that can be controlled via computer.

  • Operating Room Control

  • An obvious application of gesture control is in the OR. A surgeon cannot touch a screen, and voice command glitches could lead to unfortunate accidents. Gestures can be used to flip between X-Rays, zoom in and out, and otherwise control assistive medical technology.

  • Handicapped Access

  • For those who cannot type or speak clearly, a gesture-based interface makes a lot of sense. Naturally, one would want to customize the gestures that control such an application due to the wide range of abilities among the disabled. Since the GesturePak API includes the ability to record new gestures, this functionality can be built right into an app.

  • Real-Time Control

  • Pose matching can be used in lieu of real Kinect programming to determine the relative position of the hands. For example: you can create a gesture in which Pose 1 has your hands close together, Pose 2 has them a little further apart, and so on for about 12 poses until the last pose has them as far apart as they can reach. Now, your app will be notified when your user hits each of those poses. You can use the pose number (1-12) to calculate a zoom value for a picture. Now as your user moves their hands the photo is zooming in and out in real-time, all with very little programming!

  • Fun for Kids!

  • Since no real programming is required to recognize gestures with GesturePak, what better way to get kids interested in STEM education (Science, Technology, Engineering and Math). What could be cooler than letting kids come up with their own gestures and then mapping them to things that programs can do!


The Kinect SDK gives us the raw data. Programming gestures in code doesn't make sense. Instead, turn the gestures into data and run them through a recognition engine. GesturePak is that tool. Very little programming is required and you can create a complex gesture in less than 3 minutes. Best of all, GesturePak is only $99 US.



.NET Screencasts with .NET Rocks! guests!





   Copyright © 2006 All rights reserved.

Legal / Privacy Policy