Skip to main content

Body Tracking for desktop & mobile?

Kinect is an exceptional depth sensor to work with. However, Microsoft has repeatedly been discontinuing the devices. If you want a super-accurate Body Tracking SDK that works with all cameras, desktop computers, and mobile devices, check LightBuzz AI. LightBuzz is the result of five years of R&D, and I’m sure you’ll enjoy it very much.

Hello boys (and girls, of course). I am back!

Hello boys, I'm back!

Today, I am going to give you a detailed guide to get started with the new Azure Kinect camera programming.

What is Azure Kinect?

Azure Kinect is Microsoft’s latest depth sensing camera and the natural successor of the older Microsoft Kinect One sensor.

In terms of hardware, Azure Kinect is actually a “bundle” of 4 devices:

  • A 4K RGB camera (Color data)
  • A wide-angle depth sensor (Depth data)
  • An inertial measurement unit (Accelerometer – Gyroscope)
  • A microphone array

The Microsoft Camera SDK allows developers to access and combine the raw data of the aforementioned devices.

On top of that, Microsoft has also provided a new Skeleton Tracking kit which detects the 3D coordinates of 32 human body joints.

This is a part of what you’ll be able to do upon completing this tutorial:

Prerequisites

To run the Azure Kinect demos, you need a computer with the following specifications:

  • 7th Gen Intel® CoreTM i5 Processor (Quad Core 2.4 GHz or faster)
  • 4 GB Memory
  • NVIDIA GeForce GTX 1070 or better
  • Dedicated USB3 port
  • Windows 10

To write and execute code, you need to install the following software:

You can code your Azure Kinect applications in C++ or C#. In this tutorial, I am going to cover both languages.

Lastly, download the SDK, based on the programming language of your choice:

Azure Kinect for C++

Download the Azure Kinect SDK for the C++ programming language.

Download for C++

Azure Kinect for Unity3D

Download the Azure Kinect SDK for Unity3D (C# programming language).

Download for Unity3D

Kinect Camera Configurations

No matter which programming language you are using, the process is exactly the same. Before accessing the color, depth, and skeleton streams, you need to get some information about the connected camera. Code-wise, you need to create a device object and specify the desired parameters.

For our demo, we need to support the following parameters:

Device index

As you may know, you can plug many Azure Kinect devices in the same computer. Each device has a unique index. Specifying the device index, the SDK knows which camera to access. In the most common scenario, we only have one camera connected, thus the index would be zero.

Color resolution

The RGB color camera supports 6 color resolutions:

RGB Camera Resolution (HxV)Aspect RatioFormat OptionsFrame Rates (FPS)Nominal FOV (HxV)(post-processed)
3840×216016:9MJPEG0, 5, 15, 3090°x59°
2560×144016:9MJPEG0, 5, 15, 3090°x59°
1920×108016:9MJPEG0, 5, 15, 3090°x59°
1280×72016:9MJPEG/YUY2/NV120, 5, 15, 3090°x59°
4096×30724:3MJPEG0, 5, 1590°x74.3°
2048×15364:3MJPEG0, 5, 15, 3090°x74.3°

Be careful while choosing the preferred resolution — high resolutions may cause significant delays in your application! Moreover, the 4K resolution is only running at 15 frames per second maximum!

Color format

The color format specifies how the color frames will be encoded. For the sake of simplicity, let’s choose the RGBA format: that is, every pixel will be encoded as a set of Red, Green, Blue, and Alpha values.

Depth mode

Similarly to the color resolution, the depth mode affects the frame rate, resolution, and field of view.

ModeResolutionFoIFPSOperating range*Exposure time
NFOV unbinned640×57675°x65°0, 5, 15, 300.5 – 3.86 m12.8 ms
NFOV 2×2 binned (SW)320×28875°x65°0, 5, 15, 300.5 – 5.46 m12.8 ms
WFOV 2×2 binned512×512120°x120°0, 5, 15, 300.25 – 2.88 m12.8 ms
WFOV unbinned1024×1024120°x120°0, 5, 150.25 – 2.21 m20.3 ms
Passive IR1024×1024N/A0, 5, 15, 30N/A1.6 ms

Frame rate (FPS)

The frame rate configuration setting specifies the desired number of frames per second, as long as the aforementioned parameters allow it. For example, if you select the “WFOV unbineed” depth mode, you won’t be able to set it to 30 FPS, no matter how much you want to.

Starting and stopping the camera

Enough said! Let’s bring it all together and write some code to specify the desired configuration and open the camera!

C#

sensor = KinectSensor.Create(new Configuration
{
    DeviceIndex = 0,
    ColorResolution = ColorResolution.ColorResolution_1080P,
    ColorFormat = ColorFormat.BGRA32,
    DepthMode = DepthMode.NFOV_Unbinned,
    FPS = FramesPerSecond.FPS_30
});
sensor?.Open();

In case you want to have the default configuration, simply use:

sensor = KinectSensor.GetDefault();

Stopping the camera is straightforward, too:

sensor.Stop();

In C++, you need to explicitly create the camera and the body tracker:

k4a_device_t device = nullptr;
int device_index = 0;
k4a_device_open(device_index, &device);
k4a_device_configuration_t camera_configuration = K4A_DEVICE_CONFIG_INIT_DISABLE_ALL;
camera_configuration.color_resolution = K4A_COLOR_RESOLUTION_1080P;
camera_configuration.color_format = K4A_IMAGE_FORMAT_COLOR_BGRA32;
camera_configuration.depth_mode = K4A_DEPTH_MODE_NFOV_UNBINNED;
camera_configuration.camera_fps = K4A_FRAMES_PER_SECOND_30;
k4a_device_start_cameras(device, &camera_configuration);
k4a_calibration_t calibration;
k4a_device_get_calibration(device, camera_configuration.depth_mode, camera_configuration.color_resolution, &calibration);
k4abt_tracker_t tracker = NULL;
k4abt_tracker_configuration_t tracker_configuration = K4ABT_TRACKER_CONFIG_DEFAULT;
k4abt_tracker_create(&calibration, tracker_configuration, &tracker);

To stop the camera, call:

k4abt_tracker_shutdown(tracker);
k4abt_tracker_destroy(tracker);
k4a_device_stop_cameras(device);
k4a_device_close(device);

Note: Even though I respect the power of C++, the syntax simplicity is the main reason I’m in love with C# ❤.

Azure Kinect Frames

The color, depth, and skeleton data are bundled into frames. Each frame is a set of raw color, depth, and skeleton data. A new frame is available 30 times per second (or 15 or 5, depending on your configuration). Here is how to access a latest frame:

C#

Frame frame = sensor.Update();

C++

k4a_capture_t capture;
k4a_wait_result_t get_capture_result = k4a_device_get_capture(device, &capture, K4A_WAIT_INFINITE);

Azure Kinect: Color Data

The Microsoft SDK allows us to access the raw color data in the form of a single-dimension byte array. The byte array is encoded like this:

RGBARGBARGBARGBARGBARGBARGBARGBARGBARGBARGBARGBA...

Here’s how to access the data:

C#

if (frame != null)
{
    if (frame.ColorFrameSource != null)
    {
        byte[] colorData = frame.ColorFrameSource.Data;
        int colorWidth = frame.ColorFrameSource.Width;
        int colorHeight = frame.ColorFrameSource.Height;
    }
}

C++

if (get_capture_result == K4A_WAIT_RESULT_SUCCEEDED)
{
	k4a_image_t colorImage = k4a_capture_get_color_image(capture);
	if (colorImage != nullptr)
	{
		unsigned char* colorData = k4a_image_get_buffer(colorImage);
		int colorWidth = k4a_image_get_width_pixels(colorImage);
		int colorHeight = k4a_image_get_height_pixels(colorImage);
		k4a_image_release(colorImage);
	}
}

Azure Kinect: Depth Data

The depth data is encoded as a single-dimension array of unsigned short integer values. Each value represents the distance of the corresponding point in millimeters.

Similarly to the color data, we can access the depth data as follows:

C#

if (frame != null)
{
    if (frame.DepthFrameSource != null)
    {
        ushort[] depthData = frame.DepthFrameSource.Data;
        int depthWidth = frame.DepthFrameSource.Width;
        int depthHeight = frame.DepthFrameSource.Height;
    }
}

C++

if (get_capture_result == K4A_WAIT_RESULT_SUCCEEDED)
{
	k4a_image_t depthImage = k4a_capture_get_depth_image(capture);
	if (depthImage != nullptr)
	{
		unsigned short* depthData = k4a_image_get_buffer(depthImage);
		int depthWidth = k4a_image_get_width_pixels(depthImage);
		int depthHeight = k4a_image_get_height_pixels(depthImage);
		k4a_image_release(depthImage);
	}
}

In C++, you also need to release the frame explicitly:

k4a_capture_release(capture);

Azure Kinect: Body Tracking

Lastly, let’s get to the cool part: body tracking. The new Azure Kinect Body Tracking SDK was developed using advanced Machine Learning AI algorithms. It’s combining the raw color and depth data to accurately estimate the pose of a person.

In theory, there is no limit to the number of people the SDK can track. In practice, I would recommend avoiding overcrowded environments. First, having more than 10 people within the field of view would make most of their joints invisible to the sensor (thus, less accurate). Second, we are living in the Coronavirus era and we know it’s not safe.

Joking aside, the SDK supports the following joints:

Azure Kinect - Human body joints

Each joint has the following properties:

  • Type – the name of the joint (e.g. “Head”).
  • Tracking State – the tracking confidence  of the joint.
  • Position – the coordinates of the joint (XYZ) in the 3D world space.
  • Orientation – the orientation of the joint in terms of its parent joint.

Here’s how to access the tracked bodies and capture the position of the head joint. Obviously, you can access all of the other joints in the same way.

C#

if (frame.BodyFrameSource != null)
{
    List bodies = frame.BodyFrameSource.Bodies;
    foreach (Body body in bodies)
    {
        Joint head = body.Joints[JointType.Head];
        Vector3 position = head.Position;
    }
}

C++ is more verbose and requires a little bit more coding:

k4a_wait_result_t queue_capture_result = k4abt_tracker_enqueue_capture(tracker, capture, K4A_WAIT_INFINITE);
if (queue_capture_result == K4A_WAIT_RESULT_SUCCEEDED)
{
	k4abt_frame_t body_frame;
	k4a_wait_result_t pop_frame_result = k4abt_tracker_pop_result(tracker, &body_frame, K4A_WAIT_INFINITE);
	if (pop_frame_result == K4A_WAIT_RESULT_SUCCEEDED)
	{
		bodyCount = static_cast(k4abt_frame_get_num_bodies(body_frame));
		for (int i = 0; i < bodyCount; i++)
		{
			k4abt_skeleton_t skeleton;
			k4a_result_t bodyResult = k4abt_frame_get_body_skeleton(body_frame, i, &skeleton);
			if (bodyResult == K4A_RESULT_SUCCEEDED)
			{
				k4abt_joint_t head = skeleton.joints[K4ABT_JOINT_NOSE];
				k4a_float3_t position = head.position;
			}
		}
		k4abt_frame_release(body_frame);
	}
}

Depending on the graphics engine you are using (Unity3D, OpenCV, OpenGL, etc…), you can display the byte arrays and visualize the skeleton accordingly.

The SDKs include samples that do the visualization for you. The Unity3D SDK also includes a more advanced Avateering demo.

Azure Kinect for C++

Download the Azure Kinect SDK for the C++ programming language.

Download for C++

Azure Kinect for Unity3D

Download the Azure Kinect SDK for Unity3D (C# programming language).

Download for Unity3D

Summary

In this tutorial, you’ve learnt the following:

  • How to configure, open, and stop an Azure Kinect device.
  • How to access the raw color and depth data.
  • How to access the skeleton data.

So, what else do you want me to cover about the new Azure Kinect sensor? Let me know in the comments below!

‘Til the next time, keep Kinecting!

Cover image credits: microsoft.com

Vangos Pterneas

Vangos Pterneas is a software engineer, book author, and award-winning Microsoft Most Valuable Professional (2014-2019). Since 2012, Vangos has been helping Fortune-500 companies and ambitious startups create demanding motion-tracking applications. He's obsessed with analyzing and modeling every aspect of human motion using AI and Maths. Vangos shares his passion by regularly publishing articles and open-source projects to help and inspire fellow developers.

11 Comments

Leave a Reply to Vangos Pterneas Cancel Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.