Skip to main content

I know… This is what you’ve expected most. The reason you purchased an Azure Kinect camera in the first place. Throughout my Kinect Masterclass articles, I have been covering the various aspects of the Kinect device, helping you understand how this magnificent sensor works. Body-Tracking is what made Kinect popular back in 2009. Without further ado, I am going to show you how to track a human body in 3D.

The video below shows exactly what we’ll develop: a body-tracking application that allows you to view the skeleton from multiple angles in three-dimensional space.


To run the demos, you need a computer with the following specifications:

  • 7th Gen Intel® CoreTM i5 Processor (Quad Core 2.4 GHz or faster)
  • 4 GB Memory
  • NVIDIA GeForce GTX 1070 or better
  • Dedicated USB3 port
  • Windows 10

To write and execute code, you need to install the following software:

Did you know?…

After leaving my job at Microsoft, I have been helping Fortune-500 companies and innovative startups create amazing body-tracking applications and games. If you are looking to get your business to the next level, get in touch with me.

How Body Tracking works

Before diving into the code, it’s worth understanding how exactly body-tracking works and what kind of skeleton data Kinect can provide.

What is Body Tracking?

Body tracking is the ability to detect skeleton joints using depth or color image data. The Kinect technology can identify the coordinates of the points which belong to a specific person and output their positions in 3D. That kind of information can be used in a variety of fields. In healthcare and fitness, developers can measure the range of motion and provide smart rehabilitation. In manufacturing, Kinect systems can analyze worker behavior, performance, and safety. When used in Robotics, autonomous systems can map their surroundings and imitate human movement.

Artificial Intelligence and Machine Learning at your service!

The original Kinect for XBOX 360 had an exceptionally memorable pitch to describe its functionality: “you are the controller.” Microsoft envisioned a future beyond keyboards and mice. It was a future of natural interaction with computers. Even though that vision came true via the HoloLens device, Kinect set the path to natural user interaction due to its remarkable skeleton tracking functionality.

The software is more heavily relying on Machine Learning and starts with a 2D approach.

  • First, the Azure Kinect SDK is acquiring the depth and infrared images.
  • Then, it feeds the infrared image to a Neural Network and extracts the 2D joint coordinates and the silhouette of the users.
  • Each 2D pixel is assigned the corresponding depth value from the depth frame, thus giving its position in the 3D space.
  • The results are post-processed to produce accurate human body skeletons.

You can check the official Microsoft presentation here.

Thankfully, all of the heavy-lifting is done internally by the Azure Kinect Body Tracking SDK. There is no need to mess with the internals of the AI algorithms. All we need to do is use the proper SDK methods and, boom, we access the data!

Structure of a human body

So, what kind of data do we have available? Well, each body instance has a unique identifier (ID) and a collection of joints. The ID of the body is simply a numeric value that distinguishes one body from another. The joint collection holds a list of joint structures with their corresponding properties. Let’s explore the members of the joint structure further.

Joint Type (ID)

The ID is the unique name or type of each joint. In C#, the IDs of the joints are exposed in the JointType enumeration. The image below illustrates all of the tracked joints.

Azure Kinect Body Joints

Tracking State (Confidence)

People may stand in front of the camera in a lot of different ways. There are inconvenient cases where not every single joint is visible. Some joints may be outside the field of view or even behind physical objects. Other joints may move too quickly. In either case, developers need to know whether a joint is tracked reliably before using it. That’s why the Joint structure includes a property named TrackingState. The TrackingState allows us to know just how well Kinect is monitoring each joint. There are four levels of confidence:

  • High – Kinect is tracking this joint reliably.
  • Medium – Kinect is tracking the joint with average confidence.
  • Low – The joint is probably occluded, so Kinect is predicting its position. A joint with low confidence it’s not visible. Instead, the SDK is internally trying to estimate its coordinates based on neighboring joints.
  • None – The joint is totally off the field of view.

As a software developer, you need to take the confidence levels seriously. Imagine you are working on a healthcare application, and you are trying to measure the range of motion of the spine. If the spine joints have a confidence level of Low or None, the measurements will be irrelevant. Before accessing vital information, always check the confidence level of the joints that matter!


The Azure Kinect SDK is providing the coordinates of joints in the 3D space. What are those coordinates, exactly? The position of a joint is a set of three values: X, Y, and Z. The X, Y, and Z values are measured relative to the 3D Cartesian System. More specifically:

  • X – The horizontal coordinate
  • Y – The vertical coordinate
  • Z – The depth coordinate

If you don’t remember the Cartesian System from your high school Math class, don’t worry. The Unity3D Editor Scene view is built around the Cartesian System!


Lastly, the joint structure includes the Orientation property. Orientation describes the rotation of a joint relative to an axis and is used to determine its rotation.

Working with Body data

It’s time to launch Untiy3D and Visual Studio to start writing a few lines of code. As we’ve seen in all of my Masterclass articles, we first need to instantiate a KinectSensor object. Here’s how to open and close the device in Unity’s Start() and OnDestroy() methods, respectively.

private KinectSensor _sensor;

private void Start()
    _sensor = KinectSensor.GetDefault();

private void OnDestroy()

Now, in Unity’s Update() method, we are going to grab the latest Kinect frame and extract its data. The code below shows you how to:

  • Acquire a Kinect frame.
  • Get the skeleton data.
  • Loop into the available skeleton objects.
  • Display the position of the head joint.
private void Update()
    Frame frame = _sensor.Update();

    if (frame == null) return;

    List<Body> bodies = frame.BodyFrameSource?.Bodies;

    foreach (Body body in bodies)
        Joint head = body.Joints[JointType.Head];
        TrackingState confidence = head.TrackingState;
        Vector3 position = head.Position;
        Quaternion orientation = head.Orientation;

        Debug.Log($"Head joint" +
                  $"Confidence: {confidence}, " +
                  $"Position: {position}, " +
                  $"Orientation: {orientation}.");

Piece of cake, huh?

Displaying Body data in Unity3D

Unity3D is a very hand engine when it comes to visualizing data in the 3D world. Unity’s coordinate system matches Kinect’s, and the physical units are measured in meters. You can place objects in your 3D scene, safely assuming it’s the real world. There is one caveat: Kinect does not have negative depth (Z) values. A negative value would mean that an object is located behind the sensor. Something like that is not possible since Kinect cannot “see” behind itself.

The Azure Kinect SDK for Unity3D comes with a handy visualization element called Stickman. The Stickman prefab is nothing but a set of spheres and lines that connect them. Each sphere represents one particular joint, while each line represents the connections between them (you can think of them as bones). Here is the simple structure of the Stickman prefab:

Azure Kinect stick figure in Unity3D

To use the Stickman in our code, all we need to do is declare a StickmanManager element and call its Load() method, providing the list of skeletons as a parameter. Internally, the Stickman Manager will assign a Stickman to each tracked body and position the joints accordingly.

[SerializeField] private StickmanManager _stickmanManager;

private void Update()
    Frame frame = _sensor.Update();

    if (frame == null) return;
    List bodies = frame.BodyFrameSource?.Bodies;


Then, all you need to do is hit the Run button and stand in front of your Kinect camera!

Azure Kinect Body Tracking Unity3D - 3D Skeletons

Amazingly easy, right? Viewing the skeleton in 3D is particularly useful in applications that analyze human motion. For example, a client of ours wants to evaluate the posture of patients with kinesiology issues. As a result, it’s important to see the body from multiple angles without having the patient move. How can we do that? Simple: we’ll rotate the camera around the skeleton! Here’s the C# code that allows us to achieve the results demonstrated in the video.

private void LateUpdate()
    Vector3 cameraPosition = Camera.main.transform.localPosition;
    Vector3 originPosition =;
    float angle = 50.0f * Time.deltaTime;

    if (Input.GetKey(KeyCode.RightArrow))
        Camera.main.transform.RotateAround(originPosition, Vector3.up, angle);

    if (Input.GetKey(KeyCode.LeftArrow))
        Camera.main.transform.RotateAround(originPosition, Vector3.down, angle);

    if (Input.GetKey(KeyCode.UpArrow))
        Camera.main.transform.RotateAround(originPosition, Vector3.right, angle);

    if (Input.GetKey(KeyCode.DownArrow))
        Camera.main.transform.RotateAround(originPosition, Vector3.left, angle);

    if (Input.mouseScrollDelta !=
        Camera.main.transform.localPosition = new Vector3(
            cameraPosition.z + Input.mouseScrollDelta.y * _speed);

Try it out yourself. Here’s what each key is doing:

Up arrowRotates the view upwards (overhead).
Down arrowRotates the view downwards.
Left arrowRotates the view sideways to the left.
Right arrowRotates the view sideways to the right.
Mouse wheelZooms in or out.
Azure Kinect Body Tracking Unity3D – 3D Skeletons


In this Masterclass, you’ve learned how to acquire and visualize the 3D positions of the human body joints using the Azure Kinect Body Tracking SDK and Unity3D.

Source code

You’ve made it to this point? Awesome! Here is the source code for your convenience.

Get the Azure Kinect SDK for Unity3D

One more thing…

After leaving my job at Microsoft, I have been helping Fortune-500 companies and innovative startups create amazing body-tracking applications and games. If you are looking to get your business to the next level, get in touch with me.

Sharing is caring!

If you liked this article, remember to share it on social media, so you can help other developers, too! Also, let me know your thoughts in the comments below. ‘Til the next time… keep coding!

Vangos Pterneas

Vangos Pterneas

Vangos Pterneas is a professional software engineer and an award-winning Microsoft Most Valuable Professional (2014-2019). Since 2012, Vangos has been helping Fortune-500 companies, and ambitious startups create demanding motion-tracking applications. He’s obsessed with analyzing and modeling every aspect of human motion using Computer Vision and Mathematics. Kinect programming started as a hobby and quickly evolved into a full-time business. Vangos shares his passion by regularly publishing articles and open-source projects that help fellow developers understand the fascinating Kinect technology.


  • Hi Vangos! Thank you for all of your good work! I am wondering if there is an option to measure more detailed hand movements (finger movements). I just want to track an arm, hand and fingers

    • Hi Kurt and thanks for your comment. Kinect recognizes the hand, thumb, and fingertip joints. However, these are far from perfect for complex movements.

      If you need 3D finger tracking, you would better check for different devices, such as Leap Motion.
      If you need 2D finger tracking, you would need to train a Neural Network for that purpose.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.