In my initial Azure Kinect Masterclass guide, we explored the Color, Depth, and Body data streams. Today, I am going to explain how the RGB color data streaming works and how you can easily configure the Azure Kinect color camera.

Prerequisites

To run the demos, you need a computer with the following specifications:

  • 7th Gen Intel® CoreTM i5 Processor (Quad-Core 2.4 GHz or faster)
  • 4 GB Memory
  • NVIDIA GeForce GTX 1070 or better
  • Dedicated USB3 port
  • Windows 10

To write and execute code, you need to install the following software:

Did you know?…

After leaving my job at Microsoft, I have been helping Fortune-500 companies and innovative startups create amazing body-tracking applications and games. If you are looking to get your business to the next level, get in touch with me.

Azure Kinect RGB Color data

Kinect is equipped with a high-resolution video camera. The purpose of the RGB video camera is to see the world in 2D.

From a first look, it seems like an ordinary webcam, just like the one on your laptop. Of course, Microsoft has done a tremendous job in bringing us the highest color quality possible. Unlike a typical webcam, the Azure Kinect camera supports up to 4K resolution!

Opening the Azure Kinect Viewer tool, you should see the RGB color feed:

Azure Kinect Viewer

Color Configuration

Before diving deeper into the Azure Kinect color data types, it’s important to understand what an RGB color feed is.

You can think of the RGB feed as a two-dimensional frame of color (RGB) values. The 2D image has the following properties:

  • Width
  • Height
  • Channels

The width and height of the image specify how big or small the frame would be (e.g. 1920×1080). The number of channels specifies how many bytes we need to represent a color value (e.g. 3).

Here is an HD image I captured from the Azure Kinect device. We are measuring the dimensions of the frame starting from the top-left corner.

Azure Kinect Color Frame

If you take a closer look at my face, you’ll notice the rectangular pixels of various colors. Computer scientists represent a color pixel as a set of 3 values:

  • Red
  • Green
  • Blue

The allowed values are integer numbers between 0 and 255.

If you mix Red, Green, and Blue in different proportions, you’ll get a particular hue. For example, the pixel between my eyebrows has a value of:

  • 168
  • 98
  • 85

… giving this shade of beige:

Azure Kinect Color Pixel

Each of the RGB values is stored in one byte of computer memory. To store the whole frame, we would need an array of size equal to:

Width x Height x Channels

So, for an HD frame we would need:

1920 x 1080 x 3 = 6,220,800 bytes

For a 4K frame we would need:

4096 x 3072 x 3 = 37,748,736 bytes

Considering the huge size of the 4K image, it’s no wonder Kinect limits the frame rate to just 15 FPS.

Azure Kinect Color Frame Pixels

Now, if you align the Red, Green, and Blue values of every single pixel of the image, you’ll get a one-dimensional array structure:

The 1st element of the array is the Red value of the 1st pixel. The 2nd element of the array is the Green value of the 1st pixel. The 3rd element of the array is the Blue value of the 1st pixel. And so on for those millions of bytes 🙂

Available Color Configurations

The following configurations are available from the SDK:

ResolutionAspect RatioFormat OptionsFrame RatesField of View
3840×216016:9MJPEG0, 5, 15, 3090°x59°
2560×144016:9MJPEG0, 5, 15, 3090°x59°
1920×108016:9MJPEG0, 5, 15, 3090°x59°
1280×72016:9MJPEG/YUY2/NV120, 5, 15, 3090°x59°
4096×30724:3MJPEG0, 5, 1590°x74.3°
2048×15364:3MJPEG0, 5, 15, 3090°x74.3°

Why so many options?

If you are a beginner in Kinect development, you’d wonder why are there so many configuration options? That’s a reasonable question. Well, Kinect is flexible to cover as many application scenarios as possible.

  • 16:9 resolutions better accommodate landscape applications tracking many people.
  • 4:3 resolutions would provide a portrait-like field of view, thus being ideal for applications such as virtual coaches.

Of course, high resolution comes with a caveat in performance: the higher the resolution, the harder it’s going to be for the computer to process the information. So, if you are incorporating resource-intensive business logic and speed matters, you may use the lowest resolutions. For example, point cloud generation would be done way faster in 72op than 1440p.

It’s your job, as a responsible software developer, to select the appropriate configuration and make your customers happy!

Color Data in C#

Using C# with the Azure Kinect SDK for Unity3D, it’s pretty easy to configure the Color camera and acquire the Color data.

Configuring the Color Settings

If the above table seems confusing, don’t worry. From the developer’s point of view, all you need to do is specify the desired Resolution and Frame Rate. The SDK will automatically expose the corresponding width and height for you. The configuration is set when you create your C# Kinect device object:

private void Start()
{
    sensor = KinectSensor.Create(new Configuration
    {
        ColorResolution = ColorResolution.ColorResolution_1080P,
        FPS = FramesPerSecond.FPS_30
    });
    sensor?.Open();
}

Retrieving the Color Data

Acquiring the Color stream information (width, height, and byte array) is done on a per-frame basis. All you need to do get the latest Frame object and grab the data from its ColorFrameSource property.

private void Update()
{
    Frame frame = sensor.Update();

    if (frame?.ColorFrameSource != null)
    {
        int width = frame.ColorFrameSource.Width;
        int height = frame.ColorFrameSource.Height;

        byte[] data = frame.ColorFrameSource.Data;
    }
}

Summary

This is it, folks! In this Masterclass, we explored the Azure Kinect Color stream and elaborated on its internal structure. We’ve also seen how easy it is to access the Color data using a few lines of C# code.

The Azure Kinect SDK includes built-in samples for transforming the color byte array as a Unity Texture2D so you can display the data on the screen.

Before you go…

After leaving my job at Microsoft, I have been helping Fortune-500 companies and innovative startups create amazing body-tracking applications and games. If you are looking to get your business to the next level, get in touch with me.

Sharing is caring!

If you liked this article, remember to share it on social media, so you can help other developers, too! Also, let me know your thoughts in the comments below. ‘Til the next time… keep Kinecting!

Vangos Pterneas

Vangos Pterneas

Vangos Pterneas is a professional software engineer and an award-winning Microsoft Most Valuable Professional (2014-2019). Since 2012, Vangos has been helping Fortune-500 companies and ambitious startups create demanding motion-tracking applications. He’s obsessive about analyzing and modeling every aspect of the human motion using Computer Vision and Mathematics. Kinect programming started as a hobby and quickly evolved into a full-time business (lightbuzz.com). Vangos is sharing his passion by regularly publishing articles and open-source projects that help fellow developers understand the fascinating Kinect technology.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.