Body Tracking for desktop & mobile?
Kinect is an exceptional depth sensor to work with. However, Microsoft has repeatedly been discontinuing the devices. If you want a super-accurate Body Tracking SDK that works with all cameras, desktop computers, and mobile devices, check LightBuzz AI. LightBuzz is the result of five years of R&D, and I’m sure you’ll enjoy it very much.
Hello boys (and girls, of course). I am back!
Today, I am going to give you a detailed guide to get started with the new Azure Kinect camera programming.
What is Azure Kinect?
Azure Kinect is Microsoft’s latest depth sensing camera and the natural successor of the older Microsoft Kinect One sensor.
In terms of hardware, Azure Kinect is actually a “bundle” of 4 devices:
- A 4K RGB camera (Color data)
- A wide-angle depth sensor (Depth data)
- An inertial measurement unit (Accelerometer – Gyroscope)
- A microphone array
The Microsoft Camera SDK allows developers to access and combine the raw data of the aforementioned devices.
On top of that, Microsoft has also provided a new Skeleton Tracking kit which detects the 3D coordinates of 32 human body joints.
This is a part of what you’ll be able to do upon completing this tutorial:
Prerequisites
To run the Azure Kinect demos, you need a computer with the following specifications:
- 7th Gen Intel® CoreTM i5 Processor (Quad Core 2.4 GHz or faster)
- 4 GB Memory
- NVIDIA GeForce GTX 1070 or better
- Dedicated USB3 port
- Windows 10
To write and execute code, you need to install the following software:
- Visual Studio 2019 Community (required for C++ and C# development)
- Unity3D (required for C# development)
You can code your Azure Kinect applications in C++ or C#. In this tutorial, I am going to cover both languages.
Lastly, download the SDK, based on the programming language of your choice:
Azure Kinect for Unity3D
Download the Azure Kinect SDK for Unity3D (C# programming language).
Kinect Camera Configurations
No matter which programming language you are using, the process is exactly the same. Before accessing the color, depth, and skeleton streams, you need to get some information about the connected camera. Code-wise, you need to create a device object and specify the desired parameters.
For our demo, we need to support the following parameters:
Device index
As you may know, you can plug many Azure Kinect devices in the same computer. Each device has a unique index. Specifying the device index, the SDK knows which camera to access. In the most common scenario, we only have one camera connected, thus the index would be zero.
Color resolution
The RGB color camera supports 6 color resolutions:
RGB Camera Resolution (HxV) | Aspect Ratio | Format Options | Frame Rates (FPS) | Nominal FOV (HxV)(post-processed) |
---|---|---|---|---|
3840×2160 | 16:9 | MJPEG | 0, 5, 15, 30 | 90°x59° |
2560×1440 | 16:9 | MJPEG | 0, 5, 15, 30 | 90°x59° |
1920×1080 | 16:9 | MJPEG | 0, 5, 15, 30 | 90°x59° |
1280×720 | 16:9 | MJPEG/YUY2/NV12 | 0, 5, 15, 30 | 90°x59° |
4096×3072 | 4:3 | MJPEG | 0, 5, 15 | 90°x74.3° |
2048×1536 | 4:3 | MJPEG | 0, 5, 15, 30 | 90°x74.3° |
Be careful while choosing the preferred resolution — high resolutions may cause significant delays in your application! Moreover, the 4K resolution is only running at 15 frames per second maximum!
Color format
The color format specifies how the color frames will be encoded. For the sake of simplicity, let’s choose the RGBA format: that is, every pixel will be encoded as a set of Red, Green, Blue, and Alpha values.
Depth mode
Similarly to the color resolution, the depth mode affects the frame rate, resolution, and field of view.
Mode | Resolution | FoI | FPS | Operating range* | Exposure time |
---|---|---|---|---|---|
NFOV unbinned | 640×576 | 75°x65° | 0, 5, 15, 30 | 0.5 – 3.86 m | 12.8 ms |
NFOV 2×2 binned (SW) | 320×288 | 75°x65° | 0, 5, 15, 30 | 0.5 – 5.46 m | 12.8 ms |
WFOV 2×2 binned | 512×512 | 120°x120° | 0, 5, 15, 30 | 0.25 – 2.88 m | 12.8 ms |
WFOV unbinned | 1024×1024 | 120°x120° | 0, 5, 15 | 0.25 – 2.21 m | 20.3 ms |
Passive IR | 1024×1024 | N/A | 0, 5, 15, 30 | N/A | 1.6 ms |
Frame rate (FPS)
The frame rate configuration setting specifies the desired number of frames per second, as long as the aforementioned parameters allow it. For example, if you select the “WFOV unbineed” depth mode, you won’t be able to set it to 30 FPS, no matter how much you want to.
Starting and stopping the camera
Enough said! Let’s bring it all together and write some code to specify the desired configuration and open the camera!
C#
sensor = KinectSensor.Create(new Configuration
{
DeviceIndex = 0,
ColorResolution = ColorResolution.ColorResolution_1080P,
ColorFormat = ColorFormat.BGRA32,
DepthMode = DepthMode.NFOV_Unbinned,
FPS = FramesPerSecond.FPS_30
});
sensor?.Open();
In case you want to have the default configuration, simply use:
sensor = KinectSensor.GetDefault();
Stopping the camera is straightforward, too:
sensor.Stop();
In C++, you need to explicitly create the camera and the body tracker:
k4a_device_t device = nullptr;
int device_index = 0;
k4a_device_open(device_index, &device);
k4a_device_configuration_t camera_configuration = K4A_DEVICE_CONFIG_INIT_DISABLE_ALL;
camera_configuration.color_resolution = K4A_COLOR_RESOLUTION_1080P;
camera_configuration.color_format = K4A_IMAGE_FORMAT_COLOR_BGRA32;
camera_configuration.depth_mode = K4A_DEPTH_MODE_NFOV_UNBINNED;
camera_configuration.camera_fps = K4A_FRAMES_PER_SECOND_30;
k4a_device_start_cameras(device, &camera_configuration);
k4a_calibration_t calibration;
k4a_device_get_calibration(device, camera_configuration.depth_mode, camera_configuration.color_resolution, &calibration);
k4abt_tracker_t tracker = NULL;
k4abt_tracker_configuration_t tracker_configuration = K4ABT_TRACKER_CONFIG_DEFAULT;
k4abt_tracker_create(&calibration, tracker_configuration, &tracker);
To stop the camera, call:
k4abt_tracker_shutdown(tracker);
k4abt_tracker_destroy(tracker);
k4a_device_stop_cameras(device);
k4a_device_close(device);
Note: Even though I respect the power of C++, the syntax simplicity is the main reason I’m in love with C# ❤.
Azure Kinect Frames
The color, depth, and skeleton data are bundled into frames. Each frame is a set of raw color, depth, and skeleton data. A new frame is available 30 times per second (or 15 or 5, depending on your configuration). Here is how to access a latest frame:
C#
Frame frame = sensor.Update();
C++
k4a_capture_t capture;
k4a_wait_result_t get_capture_result = k4a_device_get_capture(device, &capture, K4A_WAIT_INFINITE);
Azure Kinect: Color Data
The Microsoft SDK allows us to access the raw color data in the form of a single-dimension byte array. The byte array is encoded like this:
RGBARGBARGBARGBARGBARGBARGBARGBARGBARGBARGBARGBA...
Here’s how to access the data:
C#
if (frame != null)
{
if (frame.ColorFrameSource != null)
{
byte[] colorData = frame.ColorFrameSource.Data;
int colorWidth = frame.ColorFrameSource.Width;
int colorHeight = frame.ColorFrameSource.Height;
}
}
C++
if (get_capture_result == K4A_WAIT_RESULT_SUCCEEDED)
{
k4a_image_t colorImage = k4a_capture_get_color_image(capture);
if (colorImage != nullptr)
{
unsigned char* colorData = k4a_image_get_buffer(colorImage);
int colorWidth = k4a_image_get_width_pixels(colorImage);
int colorHeight = k4a_image_get_height_pixels(colorImage);
k4a_image_release(colorImage);
}
}
Azure Kinect: Depth Data
The depth data is encoded as a single-dimension array of unsigned short integer values. Each value represents the distance of the corresponding point in millimeters.
Similarly to the color data, we can access the depth data as follows:
C#
if (frame != null)
{
if (frame.DepthFrameSource != null)
{
ushort[] depthData = frame.DepthFrameSource.Data;
int depthWidth = frame.DepthFrameSource.Width;
int depthHeight = frame.DepthFrameSource.Height;
}
}
C++
if (get_capture_result == K4A_WAIT_RESULT_SUCCEEDED)
{
k4a_image_t depthImage = k4a_capture_get_depth_image(capture);
if (depthImage != nullptr)
{
unsigned short* depthData = k4a_image_get_buffer(depthImage);
int depthWidth = k4a_image_get_width_pixels(depthImage);
int depthHeight = k4a_image_get_height_pixels(depthImage);
k4a_image_release(depthImage);
}
}
In C++, you also need to release the frame explicitly:
k4a_capture_release(capture);
Azure Kinect: Body Tracking
Lastly, let’s get to the cool part: body tracking. The new Azure Kinect Body Tracking SDK was developed using advanced Machine Learning AI algorithms. It’s combining the raw color and depth data to accurately estimate the pose of a person.
In theory, there is no limit to the number of people the SDK can track. In practice, I would recommend avoiding overcrowded environments. First, having more than 10 people within the field of view would make most of their joints invisible to the sensor (thus, less accurate). Second, we are living in the Coronavirus era and we know it’s not safe.
Joking aside, the SDK supports the following joints:
Each joint has the following properties:
- Type – the name of the joint (e.g. “Head”).
- Tracking State – the tracking confidence of the joint.
- Position – the coordinates of the joint (XYZ) in the 3D world space.
- Orientation – the orientation of the joint in terms of its parent joint.
Here’s how to access the tracked bodies and capture the position of the head joint. Obviously, you can access all of the other joints in the same way.
C#
if (frame.BodyFrameSource != null)
{
List bodies = frame.BodyFrameSource.Bodies;
foreach (Body body in bodies)
{
Joint head = body.Joints[JointType.Head];
Vector3 position = head.Position;
}
}
C++ is more verbose and requires a little bit more coding:
k4a_wait_result_t queue_capture_result = k4abt_tracker_enqueue_capture(tracker, capture, K4A_WAIT_INFINITE);
if (queue_capture_result == K4A_WAIT_RESULT_SUCCEEDED)
{
k4abt_frame_t body_frame;
k4a_wait_result_t pop_frame_result = k4abt_tracker_pop_result(tracker, &body_frame, K4A_WAIT_INFINITE);
if (pop_frame_result == K4A_WAIT_RESULT_SUCCEEDED)
{
bodyCount = static_cast(k4abt_frame_get_num_bodies(body_frame));
for (int i = 0; i < bodyCount; i++)
{
k4abt_skeleton_t skeleton;
k4a_result_t bodyResult = k4abt_frame_get_body_skeleton(body_frame, i, &skeleton);
if (bodyResult == K4A_RESULT_SUCCEEDED)
{
k4abt_joint_t head = skeleton.joints[K4ABT_JOINT_NOSE];
k4a_float3_t position = head.position;
}
}
k4abt_frame_release(body_frame);
}
}
Depending on the graphics engine you are using (Unity3D, OpenCV, OpenGL, etc…), you can display the byte arrays and visualize the skeleton accordingly.
The SDKs include samples that do the visualization for you. The Unity3D SDK also includes a more advanced Avateering demo.
Azure Kinect for Unity3D
Download the Azure Kinect SDK for Unity3D (C# programming language).
Summary
In this tutorial, you’ve learnt the following:
- How to configure, open, and stop an Azure Kinect device.
- How to access the raw color and depth data.
- How to access the skeleton data.
So, what else do you want me to cover about the new Azure Kinect sensor? Let me know in the comments below!
‘Til the next time, keep Kinecting!
Cover image credits: microsoft.com
This article is part of my Azure Kinect Masterclass. Check the rest of the Masterclass articles below.
Hi, this is Wonkyo.
I’m a Master’s student majoring in Mechanical Engineering.
In this page; ‘https://pterneas.com/2020/03/19/azure-kinect/’,
if I click ‘Azure Kinect for C++’, it leads to Unity 3D.
‘Azure Kinect for Unity 3D’ also leads to Unity 3D.
How can I get ‘Azure Kinect for C++’?
Besides, could I get any demo version of Body Tracking tool for C++?
(To test functions of the tool before purchase)
Thank you.
Best regards
Wonkyo J.
Also, is
Thanks for your comment. The C++ link has been updated 🙂
Hi. I am Mohan, 3d generalist. I am planning to get Intel real sense d435 or lidar based camera. Is there any software supporting full body mocap, face tracking?
Hello. RealSense does not have a dedicated SDK, however, we can provide you with our LightBuzz Body Tracking SDK for RealSense. You can see a comparison between Kinect and RealSense here. Feel free to contact me for details.
Hi! Great tutorial!
Is there a way to view the frame that was captured? If not, would you know how to capture a picture with the azure Kinect? And then view that picture?
Thank you, Christian. Yes, you can display the captured frame. Check this tutorial to learn how to do that!
Although the tutorial uses Windows 10, are there platform limitations for the Aruze Kinect device or the SDK’s? I’m running Unity on Linux (Ubuntu), and was wondering what to be aware of before ordering this.
Thanks.
Hey Tac. For now, the Unity SDKs only run on Windows. We are planning for a Linux update later.
Hey,
I am trying to run the kinfu sample code on Ubuntu 18.04 LTS but I see that it crashes at the k4a_device_get_capture line (Segmentation fault).
I have not made an code changes from: https://github.com/microsoft/Azure-Kinect-Samples/tree/master/opencv-kinfu-samples
k4aviewer works so I’m not sure why this is not
Hi Shravya. I’m not familiar with kinfu. You’d better open a ticket in the Microsoft GitHub repository.
hello!
https://youtu.be/VzzCoCJFtXg?t=44
Can I control the’Keystone’ function using Azure Kinect like the link above?