How GesturePak Works
Abstract
Microsoft has lowered the bar with Kinect for Windows so that
anyone willing and able to spend $250 can interact with their PC
using their bodies. The Kinect for Windows API gives the programmer
a stream of raw data coming from the sensor, but making sense of
that data is a challenge.
GesturePak uses the skeleton recognition
system in the Kinect for Windows SDK to help the user turn complex
gestures into data, which can then be matched against real-time
movement. Gestures are broken down into a series of poses. Pose matching
happens first. A gesture is considered recognized when the poses in
that defined gesture are matched in series, within the allowed amount of time.
Introduction
GesturePak is two things. The GesturePak Recorder is a Windows app that lets
you record, edit, and otherwise define gestures. The GesturePak API lets
a .NET developer recognize those gestures with no real programming.
A gesture has a few constraints. You can select which axes to track, which
joints to track, how much margin of error to allow, and how much time is
required and allowed in between each pose. With these simple constraints,
an unlimited number of simple and complex poses can be defined and recognized.
How does it work?
Gestures are defined and created with the GesturePak Recorder app.
The user breaks down a gesture into a series of poses, each pose
representing a body position. A stick figure of the user is shown in
real-time. As well, each pose is represented as a stick figure.
Poses can be added easily with the voice command "Snapshot," or with
a quick rocking back and forth of the head.
Poses can be deleted and inserted in the series that makes up a gesture.
The user can also watch an animation that shows the stick figure
going through the poses.
Each axis (X,Y,Z) can be optionally tracked. Each of the 20 body joints
can optionally be tracked as well. All of this is saved as data as an
XML gesture file.
Gestures can be recognized in any .NET application with the use of the
GesturePak Matcher, part of the GesturePak API, which is easily added
to any project as an assembly reference.
The GesturePak Matcher performs an analysis on each frame in real-time.
It walks through each pose in each gesture, checking to see if the user
is matching any of the poses. A certain definable amount of slop is
taken into account when matching poses. More accuracy means less slop
allowed. If a pose is matched, the application is notified via a callback.
Optionally, the programmer can opt to poll the Matcher to see if a gesture
or pose was recognized. That way, GesturePak stays completely out of the way of
the existing code.
Once pose matching is established, the Matcher checks each gesture to see
if the poses were matched in the correct order in the required amount of time.
When a gesture is matched, the application is notified via a callback, or
GesturePak can be polled as stated earlier.
Multiple gestures can be loaded at the same time, and GesturePak discerns
each gesture individually. Gestures can be sequenced.
For example: gesture A could be movement of the right hand from left to right,
and then gesture B could be movement of the right hand from right to left.
Since your code is alerted when each of these gestures is performed, your
application could then determine that both of them were performed one after
the other within a given amount of time, and that sequence could be considered
a macro-gesture.
Applications
Gesture-controlled applications are most practical in a situation where the
user is not physically able to sit down and operate a computer. Here are a
few scenarios:
-
Industrial Control
In an industrial scenario, often a machine operator can't move away from
their post in order to operate a computer. However, they could turn to the
side, extend a hand and make movements to set tolerances, start or stop
processes, or anything else that can be controlled via computer.
-
Operating Room Control
An obvious application of gesture control is in the OR. A surgeon cannot
touch a screen, and voice command glitches could lead to unfortunate
accidents. Gestures can be used to flip between X-Rays, zoom in and out,
and otherwise control assistive medical technology.
-
Handicapped Access
For those who cannot type or speak clearly, a gesture-based interface
makes a lot of sense. Naturally, one would want to customize the gestures
that control such an application due to the wide range of abilities among the
disabled. Since the GesturePak API includes the ability to record new
gestures, this functionality can be built right into an app.
-
Real-Time Control
Pose matching can be used in lieu of real Kinect programming to determine
the relative position of the hands. For example: you can create a gesture
in which Pose 1 has your hands close together, Pose 2 has them a little
further apart, and so on for about 12 poses until the last pose
has them as far apart as they can reach. Now, your app will be notified
when your user hits each of those poses. You can use the pose number
(1-12) to calculate a zoom value for a picture. Now as your user moves their
hands the photo is zooming in and out in real-time, all with very little
programming!
-
Fun for Kids!
Since no real programming is required to recognize gestures with GesturePak,
what better way to get kids interested in STEM education (Science, Technology,
Engineering and Math). What could be cooler
than letting kids come up with their own gestures and then mapping them to
things that programs can do!
Summary
The Kinect SDK gives us the raw data. Programming gestures in code doesn't
make sense. Instead, turn the gestures into data and run them through a
recognition engine. GesturePak is that tool. Very little programming is
required and you can create a complex gesture in less than 3 minutes.
Best of all, GesturePak is only $99 US.
|