Skip to main content

Body Tracking for desktop & mobile?

Kinect is an exceptional depth sensor to work with. However, Microsoft has repeatedly been discontinuing the devices. If you want a super-accurate Body Tracking SDK that works with all cameras, desktop computers, and mobile devices, check LightBuzz AI. LightBuzz is the result of five years of R&D, and I’m sure you’ll enjoy it very much.

A month ago, I was happy to receive a brand-new Kinect for Windows version 2 Developer Preview sensor. You can read my previous blog post about the capabilities of the new device. Kinect v2 now includes 5 main types of input streams:

  • Color
  • Depth
  • Infrared
  • Body
  • Audio

Today I will show you how you can acquire and display each bitmap input in a Windows application. In the next blog post, we’ll talk about body tracking. Here is a quick video I made which demonstrates the different streams provided by the new sensor.

You can download the source code here. Read on if you want to implement this functionality yourself.

Requirements

Creating a new project

Come on, launch Visual Studio and create a new WPF application. Make it target .NET framework 4.5 and give it a proper name. I named it KinectStreams. In the XAML file, add an <Image> element and three <Button> controls. The image’s source will be updated every time we have a new Kinect frame. The buttons simply switch between color, depth and infrared frame types.


<Grid>
    <Image Name="camera" />
    <Grid>
        <StackPanel>
            <Button Content="Color" Click="Color_Click" />
            <Button Content="Depth" Click="Depth_Click" />
            <Button Content="Infrared" Click="Infrared_Click" />
        </StackPanel>
    </Grid>
</Grid>

Initializing the sensor

Let’s now dive into the C# code! The primary namespace which provides us with the Kinect-related functionality remains the same:


using Microsoft.Kinect;

Do not forget to add a reference to the corresponding dll file (you’ll normally find it under C:\Program Files\Microsoft SDKs\Kinect\v2.0-DevPreview1311\Assemblies\Microsoft.Kinect.dll).

You now need two classes for connecting to the sensor and reading its camera streams. The first one is KinectSensor class. The second, and this is new to Kinect v2, is MultiSourceFrameReader class:


KinectSensor _sensor;
MultiSourceFrameReader _reader;

Comparing to the first SDK, Kinect SDK v2 Developer Preview provides an easier way for accessing the active sensor. Here’s how you connect to the sensor and start it up:


_sensor = KinectSensor.GetDefault();
if (_sensor != null)
{
    _sensor.Open();
}

Reading the streams

That’s it! We have now connected to the sensor. Let’s specify which streams we need:


_reader = _sensor.OpenMultiSourceFrameReader(FrameSourceTypes.Color |
                                             FrameSourceTypes.Depth |
                                             FrameSourceTypes.Infrared);
_reader.MultiSourceFrameArrived += Reader_MultiSourceFrameArrived;

The MultiSourceFrameArrived event fires when a frame of the specified types is available. Here’s how we can open the color, depth and infrared frames in a single method.


void Reader_MultiSourceFrameArrived(object sender, MultiSourceFrameArrivedEventArgs e)
{
    // Get a reference to the multi-frame
    var reference = e.FrameReference.AcquireFrame();
    // Open color frame
    using (var frame = reference.ColorFrameReference.AcquireFrame())
    {
        if (frame != null)
        {
            // Do something with the frame...
        }
    }
    // Open depth frame
    using (var frame = reference.DepthFrameReference.AcquireFrame())
    {
        if (frame != null)
        {
            // Do something with the frame...
        }
    }
    // Open infrared frame
    using (var frame = reference.InfraredFrameReference.AcquireFrame())
    {
        if (frame != null)
        {
            // Do something with the frame...
        }
    }
}

We’ll now see how to convert each frame reference into a WPF bitmap and display it to our <Image> element.

Color stream

The raw color images have been increased to 1920×1080 resolution. Kinect only gives us an array of pixel RGB values. We need to convert this array into a proper WPF representation. I will not get into more details about WPF bitmap processing – you can simply copy and paste the following method:


private ImageSource ToBitmap(ColorFrame frame)
{
    int width = frame.FrameDescription.Width;
    int height = frame.FrameDescription.Height;
    byte[] pixels = new byte[width * height * ((PixelFormats.Bgr32.BitsPerPixel + 7) / 8)];
    if (frame.RawColorImageFormat == ColorImageFormat.Bgra)
    {
        frame.CopyRawFrameDataToArray(pixels);
    }
    else
    {
        frame.CopyConvertedFrameDataToArray(pixels, ColorImageFormat.Bgra);
    }
    int stride = width * format.BitsPerPixel / 8;
    return BitmapSource.Create(width, height, 96, 96, format, null, pixels, stride);
}

Depth stream

The depth stream provides us with the depth value of every point in the visible area. The depth value is the distance of the sensor. We usually represent the furthest distances using black color and the nearest distances using white color. However, it’s up to you to pick different colors when visualizing depth data. In this conversion, we need two arrays: one array for storing the depth values and another one for storing the corresponding colored values.


private ImageSource ToBitmap(DepthFrame frame)
{
    int width = frame.FrameDescription.Width;
    int height = frame.FrameDescription.Height;
    ushort minDepth = frame.DepthMinReliableDistance;
    ushort maxDepth = frame.DepthMaxReliableDistance;
    ushort[] depthData = new ushort[width * height];
    byte[] pixelData = new byte[width * height * (PixelFormats.Bgr32.BitsPerPixel + 7) / 8];
    frame.CopyFrameDataToArray(depthData);
    int colorIndex = 0;
    for (int depthIndex = 0; depthIndex < depthData.Length; ++depthIndex)
    {
        ushort depth = depthData[depthIndex];
        byte intensity = (byte)(depth >= minDepth && depth <= maxDepth ? depth : 0);
        pixelData[colorIndex++] = intensity; // Blue
        pixelData[colorIndex++] = intensity; // Green
        pixelData[colorIndex++] = intensity; // Red
        ++colorIndex;
    }
    int stride = width * format.BitsPerPixel / 8;
    return BitmapSource.Create(width, height, 96, 96, format, null, pixelData, stride);
}

Infrared stream

The infrared sensor is the ability to view clearly into the dark! Accessing the infrared stream is a new feature, added to Kinect v2. Similarly to displaying depth data, we again need two arrays for holding the infrared and the depth values. Here you go:


private ImageSource ToBitmap(InfraredFrame frame)
{
    int width = frame.FrameDescription.Width;
    int height = frame.FrameDescription.Height;
    ushort[] infraredData = new ushort[width * height];
    byte[] pixelData = new byte[width * height * (PixelFormats.Bgr32.BitsPerPixel + 7) / 8];
    frame.CopyFrameDataToArray(infraredData);
    int colorIndex = 0;
    for (int infraredIndex = 0; infraredIndex < infraredData.Length; ++infraredIndex)
    {
        ushort ir = infraredData[infraredIndex];
        byte intensity = (byte)(ir >> 8);
        pixelData[colorIndex++] = intensity; // Blue
        pixelData[colorIndex++] = intensity; // Green   
        pixelData[colorIndex++] = intensity; // Red
        ++colorIndex;
    }
    int stride = width * format.BitsPerPixel / 8;
    return BitmapSource.Create(width, height, 96, 96, format, null, pixels, stride);
}

That’s it! You can now display every bitmap stream! The only thing left to do is call the corresponding method and display the frame. This is how to display the color frame, for example:


// Open color frame
using (var frame = reference.ColorFrameReference.AcquireFrame())
{
    if (frame != null)
    {
        camera.Source = ToBitmap(frame);
    }
}

Download the source code and test for yourself. Note that the demo contains additional code for body tracking. We’ll cover body tracking throughout the next blog post.

PS 1: Vitruvius

If you want to automate the above bitmap-conversion processes, consider downloading Vitruvius. Vitruvius is a free & open-source library I built, which provides many utilities for your Kinect applications, including gesture detection, voice recognition and drawing extensions. Give it a try, enjoy and even contribute yourself!

Here is how you can display a color frame using Vitruvius in a single line of C# code:


using (var frame = colorFrameReference.AcquireFrame())
{
    if (frame != null)
    {
        camera.Source = frame.ToBitmap();
    }
}

Similarly, you can display a depth or infrared frame with no pain!

PS: Vitruvius

This project is part of the Premium version of Vitruvius (original cost: $299). Vitruvius is a set of powerful Kinect extensions that will help you build stunning Kinect apps in minutes. Vitruvius includes avateering, HD Face, background removal, angle calculations, and more. Check it now.

Want to hire my team for your next cutting-edge app? Drop me an email.

Vangos Pterneas

Vangos Pterneas is a software engineer, book author, and award-winning Microsoft Most Valuable Professional (2014-2019). Since 2012, Vangos has been helping Fortune-500 companies and ambitious startups create demanding motion-tracking applications. He's obsessed with analyzing and modeling every aspect of human motion using AI and Maths. Vangos shares his passion by regularly publishing articles and open-source projects to help and inspire fellow developers.

224 Comments

  • Thanks for this great Post!
    I remember doing Gesture Recognition with the Kinect I and now – after reading your article – i’m thinking about getting the Kindect II 😉
    Keep on the good work!

  • MattW says:

    Great Article! What kind of hardware are you using for the computer? Do you need a certain level of GPU? Thanks! I’m really excited about when this goes public this summer.

    • Hi Matt. Thanks for your comment. I run the samples on an iMac computer with an Intel Core i5 2.90GHz 64-bit processor, 8GB of RAM and an NVIDIA GeForce GTX 660M graphics card. However, the current Kinect for Windows v2 sensor is in beta, so Microsoft proposed not to rush to buy new hardware. The summer hardware requirements will be quite different.

      • From the preliminary results the people from freenect have it seems that it will be impossible to reconstruct the TOF data without the use of something like OpenCL and get acceptable latency. So it’s not gonna be something that can be utilized effectively on the embedded and mobile space for at least 2-3 years. This is a regression actually

        • What’s your point and what has to do with the second version of Kinect? Really don’t get it, sorry.

          • I’m talking about the second version of the kinect. The data comes as four frames of TOF data that have to be reconstructed on the host machine running the application. Right now, doing that in the CPU yields ~10fps on high end machines. It can be accelerated nicely though using a GPU, as the xbox does, but that implies that there is some kind of OpenCL support and good bandwidth, which mobile platforms just don’t have yet. This will not be usable in the embedded space for quite some time and I just think that people should know what’s the situation before they invest in it.

        • Yannis, I get what you mean now, but you are missing some points. First of all, Kinect v2 is currently in Developer Preview. SDKs, APIs and hardware requirements will change. Many things currently handled by the CPU will be moved to the GPU. Moreover, Kinect is an accessory for XBOX or desktops. Power and hardware are critical. There are libraries and tools that transmit Kinect data to mobile and tablet devices (I have implemented such tools, too) so there is a kind of interoperability.

          More and more people invest on Kinect and this because of the business opportunities it provides. Corporations invest on Kinect for accomplishing really complex staff, such as 3D body scanning. Software developers can now build natural user interfaces faster and easier. Version 2 supports even more accurate and smoother body tracking.

          People develop software and earn money right now, so the time to invest on Kinect is right now 🙂

          • Things are already handled by the GPU in the SDK you’re using. If that wasn’t the case you wouldn’t get any sort of acceptable performance. Openkinect has already disassembled the shaders used in the SDK and they’re trivial, as expected, so this won’t get much better by software alone. I agree that the new kinect is something that will be usable only in the xbox and medium to high grade desktops.

            Now about the libraries and tools that retransmit data, I’ve written all kinds of them too. You can see my latest one in action here https://www.youtube.com/watch?v=hE3FRPn6AZ4 . The head and hand tracking in the video is done on an arm@1Ghz and transmitted via udp. The processing takes ~15% of the arm’s cpu and as you understand the latency is as low as it can get. I was actually surpised that this thing would work considering the requirements of vr. An embedded board like that costs 30$ and is smaller than a credit card. With the new kinect I’d need a 300$ PC that’s bulky and harder to install where I need, and I have very weird needs. So this is the definition of regression, especially for doing complex stuff.

            As far as getting easier to do NUI and stuff, the demo above was done with a 50 line lua script in a matter of hours, networking and all. I feel that the MS SDK and OpenNI approaches are dinosauric by now. The latter is also extict as of yesterday.

            I’ll also provide a tip. After finding a point of interest it’s useful to do an averaging of a small cube around that. As depth data tend to be very linear in nature, this cheap step will stabilize the position. In that sense, the resolution of the original kinect is underutilized as it is so the new one is not a breakthrough. Just more controlled by MS. They also have a long history of flopping perfectly good technologies which someone else adopts and makes mainstream so I’m pretty reluctant to bet on that horse. Maybe the new CEO who is an actual enginneer can change that, but probably not in this iteration of the hardware.

            As far as money is concerned, yeah we all make money out of it but it’s just services for now. The real money are in actual marketable products and the internet of things. The new kinect will certainly not facilitate these and not just cause of the corporate distrust. The big showstopper is technical.

            Sorry for being raw but I think that people should also hear the pragmatic perspective.

        • Yannis, I admire your projects, however this is not a pragmatic perspective. Microsoft SDK and OpenNI are “huge” because they provide body tracking algorithms, face tracking mechanisms and MUCH more!

          I can’t see how such things can be accomplished via LuaVision for example. It’s a great project, but it is out of scope of the series of my blog posts. I am showing people how to access the various Kinect streams. In the upcoming blog post, I’ll demonstrate body tracking and facial expressions via the Body stream. Such staff can only be achieved using the Microsoft SDK.

          No point to reinvent the wheel, right? 😉

          • OpenNI is dead, haven’t you heard? http://apple.slashdot.org/story/14/03/02/1530202/apple-closes-openni-the-open-source-kinect-framework. I’ve criticized stallman’s stance a lot but this time I feel like I owe him a beer or something. This is what happens when you trust corporations instead of the open community. Sooner or later, you’re gonna get burnt.

            The reinventing the wheel argument does not hold well here. I was recently involved in a EU project for medical rehabilitation. The requirements seemed simple at the beginning, just track the patient’s hands. Some university guys in the project said it’s ok we can handle it with the OpenNI and then the MS SDK and we don’t need to reinvent the wheel with custom tracking and stuff. But then it turned out that the user was to be seated and the kinect placed 1m ahead of him and we needed the palm orientation as well. It turned out that this was not part of the SDK but it had to be done and it was done.

            Still I understand the challenge thrown. I’ll see what I can do and get back to you

  • Erevodifwntas says:

    Yannis, I am not sure that I understand your arguments. First of all you are complaining that Kinect can not be used in an embedded/mobile platform. Yes this is the case today. My iPad mini has almost the same processing power as XBOX 360 (nvidia claims that SoC graphic performance will match the one in XBOX 360 sometime in 2014). If you also take into account that Xbox One doesn’t have a state of the art GPU and it only uses 10% of it for Kinect, then I do not really see your point. It is just a matter of time (and to be honest I do not think that it is a matter of GPU processing power at all). As I’ve already said, 10% of a mediocre GPU -like the one in Xbox One- should be available in the current gen of mobile devices. Now you can either complain why this is not feasible right now, or just envision and prepare for the world of tomorrow.

    Then, you hijack this post in order to advertise your work and compare apples with oranges. You are comparing kinect, a system that performs full body recognition/tracking (which is transparent for the user) with a system that does some kind of “hand” tracking (head tracking is performed with the traditional gyr/acc/mag combination -and I am not going to mention all those cables/devices that you have to “wear”). To be honest, I am not sure that this is hand tracking at all. What makes kinect special is that it recognizes all your body parts and understands that what you move is actual your hand. In your demo it isn’t clear to me if you recognize a hand, or just something moving. With kinect you can achieve some degree of “self-awareness”. It doesn’t simply recognizes some pixels moving around, it recognizes a hand, that it is part of a whole body that is performing an action. But of course, Kinect is not a panacea. If you want to do something that isn’t build for, then you have to do it yourself.

    (And I am not going to comment on your effort to make it a M$ vs flame)

    • Well, you can’t do positional tracking of the head using just the rift’s gyr/acc/mag combination. The cables are part of the rift which has to be connected to a machine doing the rendering. The tracking takes place on a separate arm board which has a kinect connected on it. This combo is on the right of the screen which is not visible because it’s 3-6m away from the user. The positional information of the head and hands are transmitted via udp to the box doing the rendering from which the cables run. We’re trying to make that mobile, with an android device worn on the user most probably. Min that the user’s body is not fully visible because it has tables between him and the camera. Now for the MS SDK this qualifies as “seated” mode and if that was used it would require 10 times more expensive hardware to run on and I doubt that the latency would be on par. The jitter could probably be improved with the tip I gave earlier. But unfortunately the range for the seated mode is up to 3m, so this setup is entirely not feasible with the body tracking feature of the MS SDK. This is something that needed actual creative coding but it came through in hours and 50 lines of code. If you’re not sure you are more than welcome to try it for yourself. Just email me and we can set it up.

      I assume that when you say “kinect” you mean the MS SDK for the kinect because the kinect by itself cannot understand anything, it just sees a soup of voxels. The MS SDK contains a machine learning model that actually labels joints based on statistic models. This works with the things it was trained to recognize. You can fool it very easily. I’m using a rule based approach with multiple blob tracking passes and empirically script the steps needed to find a body, find the extremities of the body and which are hands and which is head, which I assure you is quite robust for the things it tracks. It also has the advantage that if it encounters some pose that fools it, I just add more rules because I actually write the thing. You can’t feed the model of the MS SDK anything else besides what MS thinks you need. This approach is also way faster than the machine learning one and I mean way way faster. The usb transfer has more overhead than the tracking itself. This is the most crucial and understated issue with NUI. It is input, it has to be low latency and leave as much headroom as possible for the actual application to perform rendering and whatever else it does.

      Now, comparing the xboxone with the current or the next gen of mobiles is absurd. The XboxOne CPU has way more memory bandwidth and processing power than your average mobile will have for many years from now. The current mobile gen is on the level the average GPUs were in 2005. Wishful marketing is nice but hard numbers are what define what’s feasible eventually. Have you ever tried running OpenCL and OpenGL on a mobile? It’s a very tight situation

      About the M$ bashing, sure I’m biased. I present reasonable arguments though while your premises are completely off. You don’t know what you’re talking about and I’m not saying this as an insult. It’s just obvious

  • Erevodifwntas says:

    rule based + Empirical scripts vs Statistical models… Please send me your paper that you describe your approach and your results.
    Also for some reason you claim that 100% of the xbox one CPU/GPU is used for kinect (and yes, when I am speaking about kinect in a post that describes how to use the official kinect sdk -which is the M$ sdk- I mean the whole bundle: Kinect+Software). If you think that a modern mobile cpu/gpu isn’t able to provide the same horsepower as a fragment of the mediocre Xbox one CPU/GPU, please just visit nVidia site.
    Now, about the comment, feel free to say what ever you want. I had 3 arguments in my post:
    1.) that the mobile devices are not as week as you are presenting them (I am sorry but this is about numbers. Say what ever you want about me, but the numbers will prove who is wrong).
    2.) that you are hijacking this post in order to present your work (just read your posts that have nothing to do with the original post and are describing your work).
    3.) I’ve expressed my thoughts about the video that you posted and your comments. To be honest I didn’t understand that you used the actual kinect as an input for your own recognition algorithms. So, yes. You are right. I didn’t know what I was talking about:
    You were talking against MS SDK,
    you were saying that you didn’t like kinect 2 -not their sdk, you were talking about the new kinect- and
    you were claiming that you are using a different approach that recognizes hands easier.
    Also in the video there was no kinect.

    So I cannot believe how it was possible to miss that you were using kinect and not simply recognizing movement via a camera (just search oculus rift hand tracking and you will know what I mean). Sorry… you are absolutely right. This was my fault… It was obvious. You were using kinect…. (and by kinect I mean the actual device, not the sdk :p)

    But as I said before please show me your published papers with your results and then we can talk about it (it is very difficult to believe someone who uses an irrelevant blog post in order to present his work, and who claims that he is getting better results that the Cambridge research team). In your post you wrote “which I assure you is quite robust for the things it tracks.” It would be very interesting to know what are those things, but if you are comparing the whole MS SDK with a limited framework, once again you are comparing apples and oranges.

    • I already invited you to come and see the results. I will certainly not waste time writing a paper I’ve tried doing it for other stuff I’ve done and got them rejected because they were not “clear enough”. I will not post what these things are because you’ll charge me with “hijacking this post in order to present your work”. And I admit, I don’t like writing papers. This approach can be perfectly described as one though and you are most welcome to meet me so I can explain it so you can write a paper for it. I don’t mind sharing credit, I think that describing a process is important for spreading knowledge. This is a serious invitation

      “Also for some reason you claim that 100% of the xbox one CPU/GPU is used for kinect”
      Where did I say that, are you imagining things?

      “If you think that a modern mobile cpu/gpu isn’t able to provide the same horsepower as a fragment of the mediocre Xbox one CPU/GPU, please just visit nVidia site”

      Oh yeah now we’re talking numbers. Best on the mobile market right now Tegra4 with Fill Rate of 2.68 Gpix/s. Best on the market at 2004 GeForce 6800 Ultra Extreme Fill Rate 6.4 Gigapixel/s. I was mistaken, the gap is actually a year more than I thought. Xbox One has 12.8 Gpixels/sec Fill rate. This one is actually weaker than I thought it was, the PS4 has almost twice that. This is why it needs to prioritize the kinect processing which won’t be feasible in generic GPGPU environments and it’s still 5 times the best we have today and more than 10 times the avg. You don’t want to see what the current high nvidias fare. They will make your eyes bleed.

      “I am sorry but this is about numbers. Say what ever you want about me, but the numbers will prove who is wrong”

      Hmmm. No I will resist the temptation. I shall not troll thee

      “who claims that he is getting better results that the Cambridge research team”

      I don’t remember mentioning cambridge. If you mean my reference to the EU project, these university guys were from germany. I will not disclose anything more, it would be unprofessional on my part.

      ” In your post you wrote “which I assure you is quite robust for the things it tracks.” It would be very interesting to know what are those things, but if you are comparing the whole MS SDK with a limited framework, once again you are comparing apples and oranges.”

      I told you what this specific script tracks. Hands and head. These were the requirements for the vr project which was basically a one night hack in the athenian hackerspace. So far that framework, which I believe “toolkit” would describe it better, has worked fine for every single assignment I’ve been given. Go check it out for yourself it’s at https://github.com/LuaVision/lua-vision. Mind though that I’m currently refactoring it so it can be can be used from other languages besides Lua. Do you have a preference?

  • Erevodifwntas says:

    ““Also for some reason you claim that 100% of the xbox one CPU/GPU is used for kinect”
    Where did I say that, are you imagining things?”

    Perhaps I am imagining that you’ve just compared the full xbox one GPU (12Gpixels/sec fill rate) with Tegra (2.68 Gpix/s)… According to various articles, only 10% of the xbox one GPU is reserved for Kinect (video/voice). So I thing that roughly Tegra should be able to do the trick (also you should take a look at Tegra k1). This isn’t about horse power. It has to do about battery power 🙂

    I didn’t say that you mentioned Cambridge. The Cambridge team is the one that developed the recognition algorithms for Kinect (along with M$).

    “Mind though that I’m currently refactoring it so it can be can be used from other languages besides Lua. Do you have a preference?” Not really. I have developed my own algorithms 🙂 (and yes, I am using my own recognition algorithms when I have to do something that Kinect is not made for -both device/SDK. But this isn’t the place to write about).

    • “So I thing that roughly Tegra should be able to do the trick ”

      Yeah sure, if you completely saturate the GPU you MAY be able to get a decent and highly inconstant framerate. You will just have at least a full frame of latency (I hope you understand why this will happen). Then what? Won’t you have any drawing to perform? Oops. Performance will be decent only when the mobile gpu itself needs 10% of available resources to process the frames. This won’t happen with the K1 which has less than half the power of the xboxone, maybe the next gen from that will come close. So it’s at least 2-3 years before this capability will be generally available to mobiles. Is this not what I said in the beginning?

      “This isn’t about horse power. It has to do about battery power”

      Of course it has to do with power draw. How does this change the reality of the situation though? Mobiles are like that

      “I have developed my own algorithms”

      That’s great to hear. Maybe we can exchange notes. Is there somewhere I can take a peek?

      • Yannis, I really can’t understand your mentality. If you don’t support the Kinect SDK, then why do you post in a Kinect-related blog? If you have developed something better than the official Kinect SDK, then prove it, make benchmark tests or anything. It is quite unfair to blame a technology (used by millions of people) just because it is not yet available for mobile devices.

        Warning: The fact that this blog has hundreds of thousands of views doesn’t mean that I have to host flames and digital battles – for any reason. This post is a developer tutorial. You can use the comments section for asking questions or providing feedback. If you want to say something that is not related to the contents of my blog posts, please use my contact form or email me directly.

        Thank you.

        • Everything I’ve said is relevant to the kinect. Maybe you meant that this blog is about the MS SDK for the kinect. If that’s the case then ok I don’t have anything further to add. People are using the first kinect with mobiles and wont be able to use the new incarnation. They should be aware of that before engaging with the device. Other than that there’s not much more to say.

          I like that you pose challenges but it’s time to throw the ball back in the field. You had an article about implementing gestures the other day. I’ll benchmark that case and get back to you with results there

          • Let me make my case clear:

            – This blog is about whatever I like. It’s a personal website. I have posts about OpenNI with more than a million unique page views. Nowadays, my focus is on Kinect version 2 official SDK.
            – Kinect v2 is still in Developer Preview. Mobile will be supported similarly to version 1. Please do not spread false news to people, in purpose.
            – Feel free to benchmark whatever you like.

            Once again, I do not like your attitude. It is too personal and I cannot understand the reason. If you have something to say to me, simply email me your complaints. There is no need to post irrelevant thoughts to my blog, just because you think your SDK is better than Microsoft’s.

            Thank you.

  • “Kinect v2 is still in Developer Preview. Mobile will be supported similarly to version 1. Please do not spread false news to people, in purpose.”

    I presented reasonable arguments as to why this will not be the case, along with numbers as an engineer is obligated to do. Do you have any inside information from MS that their hardware will behave differently in the future? If you do then present them else people can only assume that you are spreading opinionated false news based on absolutely nothing besides your wishful thinking. You’re welcome

    • It is in a PRIVATE Developer Preview. This means “not even beta”. Please, stop being arrogant and offensive. I am not insulting you, so stop insulting me. Any more personal comments will be deleted. If you want to say something, just email me. Thank you.

  • no title says:

    Thanks a lot considerably to get discussing this specific wonderful people today you will know what you are discussing about! Book-marked. Kindly in addition check with this site Equates to). We can easily have a web page link adjust settlement involving united states

  • Shrikrishna says:

    Hi Vangos Pterneas,
    Your tutorial is just awesome…It helped us a lot. I have one question…is it possible to enhance the depth stream of kinect V2??? through programming…if yess…then whats your suggestion..
    Thank you..

    • Hi Shrikrishna. Thanks a lot for your comment. Programmatically, it is not feasible to make the sensor more accurate. The hardware has its own specifications. However, if you need to add accuracy for different objects other than human bodies, you can use external libraries such as OpenCV and EmguCV. This way, you’ll be able to a lot more things using the raw depth data.

  • Trio says:

    I LOVE the Xbox Kinect. And yeah, if you’re not careful you’ll hit a co plyear. Be careful. I won’t tell you to have fun though. No need. You’ll do that without any help from me. &-=

  • Zara says:

    I have read so many articles concerning the blogger lovers except this piece of writing is in fact
    a nice piece of writing, keep it up.

  • Hello my family member! I wish to say that this post is amazing, great written and come with
    approximately all important infos. I’d like to see extra
    posts like this .

  • Howdу, i read your blog occasionally and i own a similar one and i was jսst wondering if you
    get a lot of spam feedback? If so hoԝ dо you prevent it,
    any plugin or anƴthing you can recommend?
    ӏ get so much lately it’s driving me сrazy so any support is very much appreciated.

  • Willian says:

    Hi Vangos,

    The color frame resolution is pretty higher than the depth. Is there a way to capture color frames at same resolution of depth?

    Thanks.

  • This post is really a nice one itt helps new web users, who are wishing for blogging.

  • Have you ever considered publishing an e-book or guest authoring on other sites?

    I have a blog centered on the same information you discuss
    and would really like to have you share some stories/information.
    I know my readers would value your work. If you’re even remotely interested, feel free to shoot
    me an e mail.

  • I love what you guys are usually up too. Such clver work and exposure!
    Keep up the fantastic works guys I’ve included you guyhs to my personal blogroll.

  • each time i used to read smaller articles that also clear
    their motive, and that is also happening with
    this post which I am reading here.

  • Hello! I know this is kind of off topic but I was wondering if you knew where I could find a
    captcha plugin for my comment form? I’m using the same blog platform as yours and
    I’m having difficulty finding one? Thanks a lot!

  • You actually make it seem so easy with your presentation but I
    find this topic to be actually something which I think I would never understand.
    It seems too complicated and extremely broad for me. I am
    looking forward for your next post, I’ll try to get the hang of
    it!

  • Niccolo says:

    Hi. 1stly I would like to thank you for providing your source code. It will be very usefull for me. I’m trying to edit your code to display just a little part of the color 1920×1080 image from the kinect (then i’ll try to center this smaller part on a face). I’m completely a newbye in programming, can u help me? I located the code region where to edit:

    public static ImageSource ToBitmap(this ColorFrame frame)
    {
    int width = frame.FrameDescription.Width;
    int height = frame.FrameDescription.Height;
    PixelFormat format = PixelFormats.Bgr32;

    byte[] pixels = new byte[width * height * ((format.BitsPerPixel + 7) / 8)];

    if (frame.RawColorImageFormat == ColorImageFormat.Bgra)
    {
    frame.CopyRawFrameDataToArray(pixels);
    }
    else
    {
    frame.CopyConvertedFrameDataToArray(pixels, ColorImageFormat.Bgra);
    }

    int stride = width * format.BitsPerPixel / 8;

    return BitmapSource.Create(width, height, 96, 96, format, null, pixels, stride);
    }

    How can display, for example, just a 500×500 pixels frame in the center of the entire color image? Thank you again for your great example and work.

    • Hi Niccolo. You need to edit the image pixel by pixel.

      First, you need to create a new byte[] array for the new bitmap. The length of the new array is supposed to be the new desired width multiplied by the new desired height multiplied by 4 (bytes per pixel):

      Then, you loop through the pixels array and specify which pixel values you need. You insert the values to the corresponding position of the cropped array.

      Finally, you simply call BitmapSource.Create method to create the new bitmap using the cropped array. You can have a look at the CoordinateMapping example of the SDK to see how you can copy values between byte[] arrays.

  • The most popular motorcycles to customize included models from:.
    This was very important to the participant because once
    a Master had been elected into his Guild, it almost always conferred the title of
    “Freeman” of the town, with all the additional benefits that this entailed.

    >The mass user markets use Flash memory, along with CNC,
    however CNC is obliged to work around the number of times a Flash memory cell can be rewritten to.

  • I was suggested this web site by my cousin. I am not sure whether this post is written by him as nobody else know such detailed about my difficulty.
    You’re amazing! Thanks!

  • Bell says:

    Tremendous things here. I’m very glad to peer your post.
    Thanks so much and I’m taking a look ahead to touch you.
    Will you kindly drop me a e-mail?

  • I’m not sure exactly why but this blog is loading very
    slow for me. Is anyone else having this problem or is it a
    issue on my end? I’ll check back later and see if
    the problem still exists.

  • Hello There. I found your blog using msn. This is an extremely well written article.
    I’ll be sure to bookmark it and return to read more of your useful information. Thanks for the post.
    I will certainly return.

  • VE says:

    Hi,

    what is the resolution of the depth camera? Is it the same (VGA 640×480) like the previous Kinect?
    And what is the accuracy of the depth camera?

    Thanks,

    VE

    • Hi. The new depth resolution is 512×424. You can check the differences between version 1 and version 2 here: https://pterneas.com/2014/02/08/kinect-for-windows-version-2-overview/

      The accuracy has been improved significantly. The sensor can now track 25 joints and you’ll experience less jittering, especially when the users are in front of solid surfaces (walls, etc.).

      Cheers,
      Vangos

      • VE says:

        Hi,

        thank you for the answer. But what is the resolution shown on the display? I mean, Kinect v1 can have a VGA (640*480) depth Image through the interpolation of native resolution (320*240). Thanks

        • Kinect has 3 cameras: an RGB, a depth and an infrared one.

          Each camera has a different resolution. Kinect version 1 features a 640×480 RGB camera and a 320×240 depth camera.

          Kinect version 2 increased the resolution of all of the cameras, providing a 1920×1080 RGB camera and a 512×424 depth camera.

          • VE says:

            No, there are only 2 cameras, the other one is only IR-emitter and not a camera.
            BTW, what is the C# command for Kinect 2 (in SDK2) for this:

            this.sensor.ColorStream.Enable(ColorImageFormat.RgbResolution640x480Fps30);
            this.sensor.DepthStream.Enable(DepthImageFormat.Resolution640x480Fps30);

            Could you please see that in SDK2 sample application?

      • VE says:

        Oh anway, what do you mean by:

        Infrared stream: None

        • Kinect version 1 does not let you access an infrared stream (similarly to the color or depth stream). Kinect version 2 uses the native resolution (1920×1080), so you can’t specify a different resolution or frame rate.

  • mohamad reza kharazi says:

    hi im a biomechanics student and I’m working on cp children , we can not always transport them to biomechanics laboratory so we decided to use kinect as a motion capture device , we want kinect to capture the gait and export the data in excel form , i don’t know a lot about programing , where should i begin with ?

    i have found ipisoft software and they have a biomechanic add-ons to but unfortunately this program don’t support kinect version 2 and its to expensive .

    • Hi Mohamad. You’ll need to know some Computer Vision aspects and familiarize yourself with Kinect v2. Classic motion detection algorithms can be applied to Kinect, too. I suggest you start experimenting with the pre-installed Kinec v2 samples and C#.

  • mohamad reza kharazi says:

    thank you , can you give me some of these samples?

  • Simply install the SDK (http://www.microsoft.com/en-us/kinectforwindows/) and run SDK Browser v2 from Start 🙂

  • mohamad reza kharazi says:

    ok thank you , you are very expert in this field and I’m like a monkey whose seeking around for solutions . 🙂

  • Kris says:

    Hello Vangos.

    I understand there isn’t a way to directly control parameters like brightness, hue..etc with this new version 2.0 SDK preview for the V2 kinect sensor. If any, what adjustments did you make to the color frame to at least darken the image? Thanks so much.

  • Var says:

    Hello Vangos,

    Thanks a lot for the source code. I have a query regarding the .xef file. I have recorded some data using the Kinect Studio v2 Developer Preview version and I now want to extract and save the RGB and Depth data from the .xef files which I have recorded.

    Do you know how this can be done? How can I actually read .xef files and then extract the RGB, depth and other data streams from the file?

    Thanks a lot and Cheers,
    Var

    • Hello. XEF files are binary files, so you cannot read them as easily as e.g. XML files. XEF files are used by Kinect Studio and XBOX ONE. You’ll need to store the data into your own formats, unless Microsoft provides us with a new way during the upcoming SDK update.

  • Awesome! Its in fact awesome paragraph, I have got much clear idea concerning from this paragraph.

  • Kal says:

    Thanks for writing up this tutorial for Kinect v2.

    I have been capturing and dumping BodyFrames, BodyIndexFrames, DepthFrames, and InfraredFrames onto disk, from a MultiSourceFrameReader and a read loop in a thread. Now I want to capture and dump ColorFrames as well. Dumping the raw frame data seems to still yield acceptable frame rates. However, I have no clue how to convert from the raw data to RGBA afterwards. If I use the ColorFrame’s CopyConvertedFrameDataToArray method, performance suffers a lot and the frame rate becomes unacceptable.

    So it appears that the raw color image format is supposed to be Bgra? But why does it seem to only take 2 bytes per pixel, instead of 4 bytes? How do I convert that into 4-byte RGBA? Would you be able to enlighten me?

  • Richard Macwan says:

    I am trying to playback the .xef files using kinect studio and use them as the kinect source for my application which needs to read face expressions and body movements.
    I am unable to read in any face expressions this way. I have created a FaceFrameSource and FaceFrameReader just like the example in the kinect sdk( OF course the example too doesn’t work with recorded .xef files). So I am a bit stumped. There is no documentation whatsoever on these classes. Have you by any chance come across such a problem? I can read infrared and body frames from the recorded .xef file, not just face(and also hdface).
    Any insight will be much appreciated.

    Regards,
    Richard Macwan

    • Hi Richard. I guess this is because Kinect Studio does not record color frames (only depth, infrared and body). I think that Face depends on the color stream, so this is why you can’t playback a .xef file properly. I’ll need to examine this further. Thanks for noting.

  • Kal says:

    Would you also happen to know what is the theoretical max value of any pixel in the InfraredFrame? And do you know what the value actually corresponds to? Is it expressed in any standard unit of luminosity (lumens, lux, candella, or whatnot)? It doesn’t seem to be documented at MSDN.

  • depth image says:

    Can you help me?
    How can I save depth image as pgm?
    Thanks

  • Ali says:

    Hi;

    Thank you again for those great tutorials.
    I am a newbie at C# and thanks to your help and the Microsoft SDK, I was able to display the streams! (lol I know very simple but all this is new to me).
    So now I want to save the those streams. The only thing that I have is the screenshot button from the SDK but unfortunately I have to click on it to keep saving the frames.
    How would you suggest I save those streams continuously?

    I tried a while loop and all and got stuck in that loop. So any tips on how to save the streams continually?
    So I just basically want a record button that I can click and it saves the frames one by one.

    Thanks!
    Ali

  • jon says:

    I need help i want to save data from kinect and use it to control a robot.
    my email is jonaep_lh@hotmail.com any help?

  • Hello, this weekend is good designed for me, because this time
    i am reading this great informative paragraph here at my
    residence.

  • Very good information. Lucky me I found your website by chance (stumbleupon).

    I’ve bookmarked it for later!

  • Krishna Prakash says:

    Hello, Vangos
    can you help me out about the following error i’m getting while running your code.
    Error 1 ‘KinectStreams.MainWindow’ does not contain a definition for ‘Color_Click’ and no extension method ‘Color_Click’ accepting a first argument of type ‘KinectStreams.MainWindow’ could be found (are you missing a using directive or an assembly reference?) c:\users\krishna_prakash\documents\visual studio 2013\Projects\KinectStreams\KinectStreams\MainWindow.xaml 9 108 KinectStreams

    • Hi Krishna,

      Seems that you have accidentally deleted the following method:

      private void Color_Click(object sender, RoutedEventArgs e)
      {
      _mode = Mode.Color;
      }

      This method is included in the source code – just checked!

  • Alexander says:

    Dear Vango,
    Thank you very much for all the information you are providing. I am very new to all this electronic world (have made project using the Arduino), so please forgive my limited understanding.
    I have an air BB gun fire a 6 millimeter diameter pellet, which goes through a chronometer (built with an Arduino and LCD display), displaying speed in meters/sec and feet/sec. Max speed about 240 feet/sec.
    Can the Kinect v2 record in Depth mode only (I’m interested only in x-y-z coordinates), and put this data into an array? The array will later (at a press of a button on the Arduino) be send to Processing for 3D graphing? I’m interested in only two seconds of data (x-y-z).
    Also, the Arduino is coming out soon with a new ARDUINO TRE with a 1-GHz Sitara AM335x processor. Can I connect the Kinect v2 to this new device and perhaps get a faster sampling? I’m interested in capturing as fast as possible the Depth data and store it into an array.
    I am presently studying Processing and reading articles on the new Kinect v2.

    Thank you very much for your time and effort.
    Sincerely,

    Alexander

  • Antonietta says:

    I pay a visit day-tο-day a few blogs and sites to read posts, еxcept this website
    offers feature based articles.

    • Alexander says:

      Thank you, but I’m not familiar with other blogs. Can some one point me in the right direction? Thank you for your time.

  • Wonderful goods from you, man. I’ve understand your stuff prior to and you’re simply
    extremely great. I really like what you have obtained right here, really like
    what you’re stating and the way in which by which you assert it. You are making it enjoyable and you still take care of to keep it sensible. I can’t wait to
    learn far more from you. That is actually a wonderful web site.

  • Ahmed says:

    Hello Vangos,

    Thank you very much for the presenting such a nice tutorial and I hope it will be very helpful. I want to record a video stream color/depth/skeleton joints simultaneously using Kinect v2. I don’t know much about the WPF programming. So I started first by making the WPF project and make 3 buttons, color, depth, and joints. I am not exactly sure where I can add the code to initialize the sensor and the remaining part to record the video stream. Please can you help me to explain some more basic steps for the beginners?

    I am waiting for a kind and soon reply.
    Thank you

    • Hi Ahmed. Along with the SDK, Microsoft provides Kinect Studio, a handful utility that lets you record Kinect frames and save them as .xef files.

      If you need to implement this functionality by yourself, you’ll need to capture the bitmaps and pass them to a recording tool. AForge is a library with such a tool available.

      There is also a beta implementation for WinRT in my project Vitruvius.

      Hope this helps you.

      Alternatively, you can store the raw frames in binary files (which would be faster in terms of performance) and read those files later.

      • Ahmed says:

        Hello Vangos,
        Thank you very much for your reply. you mean I should capture the images and pass them to recording tool such as AForge to make the video using these images?
        and what is the difference between to record the images in bitmaps and binary files?

        • Correct. There are two options:
          1) You create a bitmap from each frame and then pass that bitmap to the recording tool.
          2) You save each frame as a binary file and create the video after you stop recording.

          A bitmap is generated by the byte array of the pixels and represents the corresponding image. A binary file is simply the array stored into a file. You can check C#’s BinaryWriter for more details.

          • Ahmed says:

            OK.
            what I am thinking to do is, to record the color/depth images in bitmap format after recording I want to use MATLAB to make the video. Is there any way to record the depth silhouettes?

          • You can record the depth frame or the body index frame. The depth frame will give you the distances of every point from the camera. The body index frame specifies whether a given point belongs to a player.

            If you just need a bitmap visualization of the silhouettes, you can save the depth frame as a bitmap.

          • Ahmed says:

            Hello Vangos, finally I can capture the frames and combine them to make a video. When I capture a frame, I have several black dots at the corner of the image. I tried to adjust the brightness to overcome that but its still same (Kinect_v2 image below is the link). What will you suggest me to do to remove that?
            Secondly, Is it possible I can remove the background when capturing depth image, same like Withoutbackground?

            Withoutbackground: https://www.dropbox.com/s/t9swjr1s7ut5iqp/Kinectv1.PNG?dl=0
            Kinect_v2:https://www.dropbox.com/s/wo35u6uan03j107/Kinectv2.PNG?dl=0

  • Priya says:

    Hi Vangos Pterneas,

    Every time I take skeletal point measurements of a person, they show variations. Even if the same person stands in front, the reading changes based on the distance from camera, the clothes that they wear etc. Based on your experience, do you know of any mechanisms where I can limit these variations and get consistent readings for same person under different condition? Are there any tools way to find out the invariant joints/points? I am using Kinect V2 with the latest SDK

    Thanks
    Priya

    • Hi Priya. You need to specify the relative positions of the joints. Do not save the X/Y/Z values as-is. For example, you could only save the angle between the joints. This way, you can check for the angle value instead of the exact position of every single joint. You can also check multiple angles.

      Hopefully, that will solve your problem.

  • Aamna says:

    Hello Vangos,

    I have a code for recording the RGB, Depth, and IR video streams. I want to add one more function to save the skeleton joints in .txt file.
    Can you help me to point out where and how can that function in that code?
    and can I paste my code here?

  • Julian says:

    hi there,

    I would like to make a request. is it possible if you do the skeletal tracking in windows form application instead of WPF.
    p.s. currently working on a project with Kinect which requires only windows form application

  • Julian says:

    Thanks jeni,

    is there any different if I were to use sdk 2.0 compare to the link that u provided for me?

  • Julian says:

    thanks very much for the help Jeni!!

  • Julian says:

    Is there any SDK 2.0 examples that I can use as example for skeleton tracking in windows form application??

  • jeni says:

    hello vangos
    Do u have code using hand gestures as mouse? If yes please send this to my id. thanks and sorry for asking my requirements in this platform

  • Ali says:

    Hi Vangos;

    First off, thank you so much for all those useful materials about Kinect v2 sensor.

    So I went through the above code and everything works fine but now I would really like to save those “streams” as a series of images or video file.
    So the color image I would like to have it as a series of png files or an avi video file and for the depth images I would like to save all the depth information and save it. I am not sure what format it would be such that I can get all the depth data and not just an image.

    So can you please point me in the right direction so that I can record the “streams”

    Thanks
    Ali

  • Herry says:

    Hi! Thanks for your generous sharing.
    I really want to record depth frames to test my algorithm. However, I can only understand c++. Despite that I read your codes so hard, I still can’t understand how to use them. Is there any chance you can provide some example codes to record and play depth video?
    Many thanks!!! My thesis is counting on this…(cry)
    Herry

  • Nikita says:

    Hi Vangos,

    Thank you very much for your posts. They are all very helpful 🙂
    Although everything seems to be working fine, my application
    is extremely slow when using the various streams.
    Do you have any idea why this might be?
    Thank you in advance,
    Niki

    • Hi Nikita. Seems like a hardware issue to me. What type of processor/RAM do you use? You can also check Kinect Configuration Verifier (from the SDK Browser) to ensure that your system meets the hardware requirements.

      • Nikita says:

        Hi Vangos,
        after a lot of searching I found out that when you run one of the samples in the SDK
        at the same time with your application it can be much faster. So this is what I do, since my system
        meets all the requirements.
        Thanks for your reply 🙂

  • Chirag Agarwal says:

    Hi Vangos,

    Appreciate your sharing the code. However, I had a question not relating to the codes. I read that the Kinect uses 16 bit of information for representing the Depth values but what I see in the depth stream is that even if I take a curved object I am not able to find the contour of the surface using the given depth values. Is the actual depth value something else than what we see?

    Looking forward for your reply.

    • Hi Chirag. The raw depth values indicate the distance of a point in meters. Is this what you need?

      • Chirag Agarwal says:

        Yes I know that. The thing is if I take the depth image of an object then I should get a range of depth values using which I can easily determine the curvature of the object.

  • Adnan says:

    Hello,
    I have recorded a depth video using Kinect v2 when I extracted images using MATLAB then each image is 3 channel. Normally the depth images I saw are just 1 channel.So is it correct to have 3 channel depth image? because 3 channel depth image how can I check the intensity?

    Thank you

    • Jean Michel Franco Lozada says:

      Hi! can you tell us how you managed to record a depth video, an the extract those to matlab?

      also, i think that maybe each image has 3 channels, each for X, Y and Z coordinates?

      • Adnan says:

        Here is the code for the recording the video..

        IplImage depth = new IplImage(512, 424, BitDepth.U16, 1);
        CvVideoWriter DepthWriter;
        Width = sensor.DepthFrameSource.FrameDescription.Width;
        DHeight = sensor.DepthFrameSource.FrameDescription.Height;
        WbDepth = new WriteableBitmap(DWidth, DHeight, 96, 96, PixelFormats.Gray16, null);
        int depthshft = (int)SliderDepth.Value;
        using (DepthFrame depthframe = frame.DepthFrameReference.AcquireFrame())
        ushort* depthdata = (ushort*)depth.ImageData;
        if (depthframe != null)
        {
        Depthdata = new ushort[DWidth * DHeight];
        ushort[] Depthloc = new ushort[DWidth * DHeight];
        depthframe.CopyFrameDataToArray(Depthdata);
        for (int i = 0; i < DWidth * DHeight; i++)
        {
        Depthloc[i] = 0x1000;
        }
        colorspacePoint = new ColorSpacePoint[DWidth * DHeight];
        depthspacePoint = new DepthSpacePoint[CWidth * CHeight];
        sensor.CoordinateMapper.MapDepthFrameToColorSpace(Depthloc, colorspacePoint);
        for (int y = 0; y < DHeight; y++)
        {
        for (int x = 0; x < DWidth; x++)
        {
        if (depthshft != 0)
        {
        Depthdata[y * DWidth + x] = (ushort)((Depthdata[y * DWidth + x]) << depthshft);
        }

        }
        }
        depth.CopyPixelData(Depthdata);
        }
        WbDepth.WritePixels(new Int32Rect(0, 0, DWidth, DHeight), Depthdata, strideDep, 0);
        ImageDepth.Source = WbDepth;
        if (depth != null && DepthWriter.FileName != null) Cv.WriteFrame(DepthWriter, depth);
        Cv.ReleaseVideoWriter(DepthWriter);
        if (CheckBox_saveD.IsChecked == true)
        DepthWriter = new CvVideoWriter(string.Format("{1}\\Scene{0}_DepthRecord.avi", scene, TextBlock_saveloca.Text.ToString()), FourCC.Default, 30.0f, new CvSize(512, 424));
        CheckBox_saveD.IsEnabled = false;
        if (CheckBox_saveD.IsChecked == true) Cv.ReleaseVideoWriter(DepthWriter);
        ____________________________________________________________________________________
        In MATLAB I am using
        X_D = VideoReader('DepthRecord.avi');
        nFrames =X_D.NumberOfFrames;
        S_D=read(X_D, FrameNumber);

        ________________________________________________________________________________________

        Why you think the 3 channel is X, Y, and Z?

        • Jean Michel Franco Lozada says:

          no, looking at your matlab code, maybe you have a matrix with 512x424x3, and it means that is having 3 color planes…also… if you are recording to a .avi you are loosing depth information. in a .avi container you can have values up to 256, but with original depth data you have values up to 4096.

          • Adnan says:

            Correct. So I tried to save the sequence of depth frames now to get correct depth image. But when I tried to open these depth images in MATLAB, then the background of is changing alot. Here is result. https://www.dropbox.com/s/9ej71k1q1op2eyo/Capture1.PNG?dl=0

          • Jean Michel Franco Lozada says:

            those are two consecutive frames?? then you maybe are doing something wrong in the adquisition… but thats not my territory, i dont know anything about the kinect sdk… and thats why im here, because i want to adquire the depth frames with precise timestamps…

            here is an example that lets you adquire depth frames, and export those directly to matlab, with all the depth range! maybe this can work for you!
            http://www.codeproject.com/Tips/819613/Kinect-Version-Depth-Frame-to-mat-File-Exporter?msg=4923790#xx4923790xx

            but its a resources eater, it doesnt finish an adquisition without a force close of the running program… and because of that, cant manage a constant framerate, adn the programm has to finish to report the timestamps… and because it needs to be force closed i cant get the times of adquisition…

            give it a try and lets share our results!

  • Vathsal says:

    Hi Vangos ,

    Really appreciate your exhaustive material on the kinect .

    I had a small query , on which you might be able to throw some insight .
    For an application we are working on , we need to write the raw depth data acquired from kinect v2.0 to a text file . I have a c# code that converts the live data acquired from the kinect v2.0 sensor to a .txt file . But this process mandates the kinect to be connected all the time .So , I would want to know if there is a way , wherein we can convert the pre-recorded raw data ( .xrf format ) to (.txt format) .

    Thanks for your time.

  • Imran Butt says:

    Hi,
    Thanks for sharing valuable material.
    Can you please guide me zoom-In and Zoom-Out for color stream in WPF? Same like Skype with XBox1.

    Regards

  • Aleks says:

    Hi

    Thanks for the good blog. I have a question about the kinet studio v2. When i record all streams and play back then in my kinect app the FaceFrame is always null.
    FaceFrameResult result = frame.FaceFrameResult;
    if (result != null)
    {
    }

  • Aleks says:

    Hi

    Thanks for the good blog. I have a question about the kinet studio v2. When i record all streams and play back then in my kinect app the FaceFrame is always null.
    FaceFrameResult result = frame.FaceFrameResult;
    if (result != null)
    {
    }

    Is the FaceFrame avaiable from the kinect studio?

    Thanks for help

  • Imran Butt says:

    Hi,
    How can I broadcast or send the color image to Network port. From other application want to receive the same.

    Regards

  • icheee.com says:

    It’s hard to find knowledgeable people in this particular subject, but
    you seem like you know what you’re talking about! Thanks

  • Fay says:

    Hi PTERNEAS,

    I’m actually working on a project of facial recognition ( with the Face basic : Avatar ). I have to calculate the distance min and max between the kinectV2 and the face to make the experience work ( with all the parameters of the camera : depth ), but actually i don’t know how to do that.
    Can you please help me ?
    Thanxs

    • Hi Fay. Please check this blog post about HD Face. You can use the code provided in that blog post and find the distance between e.g. the sensor and the forehead by adding the following line of code:

      double distance = vertices[HighDetailFacePoints.HighDetailFacePoints_ForeheadCenter].Z;

      • Fay says:

        Hi PTERNEAS

        Thank you for your help, i’ve an other question for you , because i want to do a mapping of the cloud points on the face basic wpf (Avatar) . I have the two codes of each situation separatly and everyone works, but when i want to combine them in the same code , i’ve nothing as result because it seems there is a conflict beetween the cloud points and the avatar and sometimes i have just the result of the face basic but it’s very slow .

        So my question is : Do you know the code to map the cloudpoints on the avatar ?

        Thank you PTERNEAS

  • shilpa says:

    hey by this can we get Depth Basics -WPF data .Also is it possible to visualize depth and color simultaneously?

    • Hi Shilpa. Yes, you can visualize both. You can use two different Image controls. Keep in mind that rendering both streams into bitmaps will be quite resource-intensive.

      • shilpa says:

        Thanks . I went through your codes and its very useful for beginners like us. I read many comments regarding visualizing RGB and Depth it simultaneously .But no one tried to crack the codes for that . Sir is it available ?If possible please share with us .

      • shilpa says:

        Thanks . I went through your codes and its very useful for beginners like us. I read many comments regarding visualizing RGB and Depth simultaneously .But no one tried to crack the codes for that . Sir is it available ?If possible please share with us .

  • Michal says:

    int stride = width * format.BitsPerPixel / 8;
    format.BitsPerPixel – what is this? It’s unrecognized for my VS 15. Should I add any lib.?

  • Aamna says:

    Hey Vangos!
    i just wanted to ask you a simple question that is your code compatible with visual studio 2010 or 2012 and does it face any errors incase the installed sdk is of version 1.5 ?

  • Nitesh Singh says:

    Hi Vangos!! First of all , your tutorials are really good and easy to understand.So a big thanks for that.
    Secondly, I have a question that how can we get depth data in 320*240 instead of 512*424 ? I mean is there any possibility that we can directly from Kinect. If not, then what is the method you can suggest?
    Thanks in advance!! 🙂

    • Hi Nitesh. Thanks for your comment. Kinect v2 only gives you access to a 512×424 depth stream. You have no control on the resolution of that stream. You can only get the full array and extract the values you do not need. The depth stream is simply an array o unsigned short values.

  • Faraz says:

    Hi, earlier in the comments section you mentioned that there are tools for sending kinect data to mobile devices or tablets. I have an android smart TV kind of screen I want to send kinect data too. Any help on how to achieve this would be helpful as I’m kind of stuck.

    Thank you and nice tutorial 🙂

  • more info says:

    Can I just say what a relief to find a person that truly knows what they’re talking about
    on the web. You certainly realize how to bring a problem to light and make it important.
    More and more people have to check this out and understand
    this side of the story. It’s surprising you’re not more popular since you most certainly possess the gift.

  • George Sami says:

    Dear Vangos,
    how to reduce frame rate of kinect v2 C# WPF

    • Hi George. Kinect SDK provides you with a specific number of frames per second. You can’t manually set the number of frames Kinect would give you. However, you can specify how many frames to ignore. So, you can ignore half of the provided frames and reduce the volume this way. To ignore the number of frames in WPF, simply use a counter and ignore e.g. the even frames.

    • 30 juin 2012I simply had to thank you very much once more. I am not sure the things I would’ve done without those concepts documented by you over that industry. This has been a very distressing situation for me, nevertheless understanding the professional way you solved the issue made me to leap with contentment. Now i’m thankful for this work and even pray you are aware of a powerful job you happen to be getting into training men and women through your site. I am sure you’ve never encountered all of us.  

  • Abishek Shankar says:

    Hi Vangos,

    Your blog was very useful to me. I am trying to get the x,y,and z coordinates of a moving object using the kinect sensor and matlab. If you feel coding in matlab is tough any other software would also be highly appreciated. Would love to hear ways in which this can be done. It would also be wonderful if you can mail me the code for the same. I am an ardent roboticist and I want to make a project where i can track a moving robot. Looking forward for you inputs. Thanks once more for your extensive blogs.

    • Hi Abishek. The Kinect SDK does not provide native object-tracking methods (only body tracking). You should implement your own object-tracking functionality by utilizing the depth and infrared streams.

  • Abishek Shankar says:

    Hi Vangos,

    Abishek here again. I wanted to understand whether the depth data that the kinect sensor gives is measured from the kinect sensor to the pixel or is the real world z coordinate. Other than that, I also want to know how to convert the depth value(0-2047) into metric units. There are many formulas for this on the web, but i am not sure as to which one to trust. I also want to know as to what is the width frame and height frame of an image captured by the kinect. Can you also please let me know the correct formula to convert the pixel size into real world x,y coordinates.

    1cm = 37.79 pixel

    Thanks a lot,
    Regards,
    Abishek

    • Hi Abishek. A depth point is the distance between a particular point in space and the vertical plane of the sensor. It is measured in millimeters. The Z position of a Joint is measured in meters. The depth array is an array of ushort values.

      There is no formula to convert screen pixels to real-world units. However, you can find the 3D point of a 2D point by using Coordinate Mapping:

      var point3D = sensor.CoordinateMapper.MapDepthPointToCameraSpace(depthPoint, depth);

  • Francisco says:

    Hello, Vangos.

    Your work on Vitruvius has helped me so much, Thank you.

    I was wondering if you can take a snapshot while the skeleton is being drawn on the viewer.
    The following code saves an image but the skeleton is not included :

    string path = System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.MyPictures), “my-capture.jpg”);
    (viewer.Image as WriteableBitmap).Save(path);

    Any ideas? I apologize if you already answered this, I must have missed it.

    • Hi Francisco. Thank you for your comment. This is already part of the WPF/Windows framework. To capture the whole view (canvas + image), simply use the following code. The “visual” parameter is the parent element of your Canvas and Image (e.g. a Grid or a ViewBox).

      var bitmap = new RenderTargetBitmap(
      (int)visual.RenderSize.Width, (int)visual.RenderSize.Height,
      96.0, 96.0, PixelFormats.Pbgra32);
      bitmap.Render(visual);

      var encoder = new PngBitmapEncoder();
      var outputFrame = BitmapFrame.Create(bitmap);
      encoder.Frames.Add(outputFrame);

      using (var file = File.OpenWrite(“path-to-image.png”))
      {
      encoder.Save(file);
      }

  • Hello Vangos,

    I am an undergraduate computer engineer student from University of Patras, Greece .
    I use Kinect SKD V2 and it is really amazing!!I am trying to import from depth data two point clouds.
    1. point cloud for everything is on the view of the sensor.
    2. human point cloud from depth data.
    I use this code sample that imports the human point cloud correct 2. but i dont know how to import the 1. and use the X-Y-Z position.Which value is the depthdata for all the scene that Kinect records?

    public class KinectRecorder
    {
    public delegate void NewPointEventHandler(Object sender, NewPointEventArgs e);
    public delegate void KinectV2RecorderEventHandler(Object sender, EventArgs e);

    public event NewPointEventHandler NewPointEvent;
    public event KinectV2RecorderEventHandler FrameStarted;
    public event KinectV2RecorderEventHandler FrameEnded;

    private const int minNumOfPoints = 1000;

    private const int step = 1;
    static int ty = 0;
    private bool done = false;
    private int count = 0;

    private ArrayList clouds = new ArrayList();
    private Stopwatch t0 = new Stopwatch();

    ///
    /// Indicates opaque in an opacity mask
    ///
    private const int OpaquePixel = -1;

    ///
    /// Size of the RGB pixel in the bitmap
    ///
    private readonly int bytesPerPixel = (PixelFormats.Bgr32.BitsPerPixel + 7) / 8;

    ///
    /// Active Kinect sensor
    ///
    private KinectSensor kinectSensor = null;

    ///
    /// Coordinate mapper to map one type of point to another
    ///
    private CoordinateMapper coordinateMapper = null;

    ///
    /// Reader for depth/color/body index frames
    ///
    private MultiSourceFrameReader multiFrameSourceReader = null;

    ///
    /// Intermediate storage for receiving depth frame data from the sensor
    ///
    private ushort[] depthFrameData = null;

    ///
    /// Intermediate storage for receiving color frame data from the sensor
    ///
    //private byte[] colorFrameData = null;

    ///
    /// Intermediate storage for receiving body index frame data from the sensor
    ///
    private byte[] bodyIndexFrameData = null;

    ///
    /// Intermediate storage for the depth to color mapping
    ///
    // private ColorSpacePoint[] colorPoints = null;

    ///
    /// Intermediate storage for the depth to color mapping
    ///
    private CameraSpacePoint[] cameraPoints = null;

    ///
    /// Initializes a new instance of the MainWindow class.
    ///
    public KinectRecorder()
    {

    }

    public void start()
    {
    if (kinectSensor == null)
    {
    startRecording();
    }
    }

    private void startRecording()
    {

    this.t0.Start();

    // get the kinectSensor object
    this.kinectSensor = KinectSensor.GetDefault();

    // open multiframereader for depth, color, and bodyindex frames
    this.multiFrameSourceReader = this.kinectSensor.OpenMultiSourceFrameReader(FrameSourceTypes.Depth | FrameSourceTypes.BodyIndex);

    // wire handler for frames arrival
    this.multiFrameSourceReader.MultiSourceFrameArrived += this.Reader_MultiSourceFrameArrived;

    // get the coordinate mapper
    this.coordinateMapper = this.kinectSensor.CoordinateMapper;

    // get FrameDescription from DepthFrameSource
    FrameDescription depthFrameDescription = this.kinectSensor.DepthFrameSource.FrameDescription;

    int depthWidth = depthFrameDescription.Width;
    int depthHeight = depthFrameDescription.Height;

    // allocate space to put the pixels being received and converted
    this.depthFrameData = new ushort[depthWidth * depthHeight];
    this.bodyIndexFrameData = new byte[depthWidth * depthHeight];
    // this.colorPoints = new ColorSpacePoint[depthWidth * depthHeight];
    this.cameraPoints = new CameraSpacePoint[depthWidth * depthHeight];

    // open the sensor
    this.kinectSensor.Open();

    }

    public void StopRecording()
    {
    Console.WriteLine(“Flushing data to disk..”);

    this.multiFrameSourceReader.Dispose();
    this.multiFrameSourceReader = null;

    if (this.kinectSensor != null)
    {
    this.kinectSensor.Close();
    this.kinectSensor = null;
    }
    }

    ///
    /// Handles the depth/color/body index frame data arriving from the sensor
    ///
    /// object sending the event
    /// event arguments
    private void Reader_MultiSourceFrameArrived(object sender, MultiSourceFrameArrivedEventArgs e)
    {
    if (!done)
    {
    if (FrameStarted != null)
    {
    FrameStarted(this, EventArgs.Empty);
    }

    if (t0.ElapsedMilliseconds > 41)
    {

    t0.Restart();

    int depthWidth = 0;
    int depthHeight = 0;

    int bodyIndexWidth = 0;
    int bodyIndexHeight = 0;

    bool multiSourceFrameProcessed = false;
    // bool colorFrameProcessed = false;
    bool depthFrameProcessed = false;
    bool bodyIndexFrameProcessed = false;

    MultiSourceFrame multiSourceFrame = e.FrameReference.AcquireFrame();

    if (multiSourceFrame != null)
    {
    // Frame Acquisition should always occur first when using multiSourceFrameReader
    using (DepthFrame depthFrame = multiSourceFrame.DepthFrameReference.AcquireFrame())
    {

    using (BodyIndexFrame bodyIndexFrame = multiSourceFrame.BodyIndexFrameReference.AcquireFrame())
    {
    if (depthFrame != null)
    {
    FrameDescription depthFrameDescription = depthFrame.FrameDescription;
    depthWidth = depthFrameDescription.Width;
    depthHeight = depthFrameDescription.Height;

    if ((depthWidth * depthHeight) == this.depthFrameData.Length)
    {
    depthFrame.CopyFrameDataToArray(this.depthFrameData);

    depthFrameProcessed = true;
    }
    }

    if (bodyIndexFrame != null)
    {
    FrameDescription bodyIndexFrameDescription = bodyIndexFrame.FrameDescription;
    bodyIndexWidth = bodyIndexFrameDescription.Width;
    bodyIndexHeight = bodyIndexFrameDescription.Height;

    if ((bodyIndexWidth * bodyIndexHeight) == this.bodyIndexFrameData.Length)
    {
    bodyIndexFrame.CopyFrameDataToArray(this.bodyIndexFrameData);

    bodyIndexFrameProcessed = true;
    }
    }
    multiSourceFrameProcessed = true;
    }
    }
    }

    // we got all frames
    if (multiSourceFrameProcessed && depthFrameProcessed && bodyIndexFrameProcessed)
    {
    StringBuilder sb = new StringBuilder();
    int len = 0;

    // this.coordinateMapper.MapDepthFrameToColorSpace(this.depthFrameData, this.colorPoints);
    this.coordinateMapper.MapDepthFrameToCameraSpace(this.depthFrameData, this.cameraPoints);

    // loop over each row and column of the depth
    for (int y = 0; y < depthHeight; y += step)
    {
    for (int x = 0; x < depthWidth; x += step)
    {

    // calculate index into depth array
    int depthIndex = (y * depthWidth) + x;

    byte player = this.bodyIndexFrameData[depthIndex];

    // if we're tracking a player for the current pixel, sets its color and alpha to full
    if (player != 0xff)
    {

    ty = ty + 1;
    CameraSpacePoint p = this.cameraPoints[depthIndex];

    // retrieve the depth to color mapping for the current depth pixel
    // ColorSpacePoint colorPoint = this.colorPoints[depthIndex];

    // make sure the depth pixel maps to a valid point in color space

    if (!(Double.IsInfinity(p.X)) && !(Double.IsInfinity(p.Y)) && !(Double.IsInfinity(p.Z)))
    {
    sb.Append(String.Format(CultureInfo.InvariantCulture, "{0} {1} {2} \n", p.X, p.Y, p.Z));

    if (this.NewPointEvent != null)
    {

    this.NewPointEvent(this, new NewPointEventArgs(p.X, p.Y, p.Z));
    len++;

    }
    }

    }
    }

    }
    }

    }
    }

    {
    Thread.Sleep(0);
    new KinectRecorder();
    }
    }
    }

  • Markus says:

    Great post. Thanks!
    Regarding the colorFrame: I am wondering if it’s easily possible to get the coordinates of an image part where a color of e.g. green is? So the idea is to – instead of using a 3D model – using a specified color (of a e.g. small badge) to track objects.

    Thanks,

    markus

    • Hi Markus. If you do not need body-tracking capabilities, you’d better use EmguCV for RGB image processing. Emgu will help you with marker detection with its native functions. Of course, you can use Emgu with Kinect. It’s a C# library that can be used with any .NET project.

  • Momo says:

    Hello, thanks for your post

    I’m trying to make this code work on my pc but i have some problems.
    Could you tell me what the ‘format’ is in the ‘ToBitmap’ function ?

  • Harold says:

    Hi Vangos,

    Do you have any advice on how to smooth kinect V2 depth data?
    What kind of filter do you recommend?

    Thanks & Regards,
    Harold

  • Nabeel says:

    Hi Vangos !
    First of all I would like to appreciate your boundless work that have done Kinect for windows.

    Basically I was trying to get the ColorData in 640*480 / 320*240 instead of 1920*1080 is that possible or if not how can it be ?

    thanks in advance

    • Hello Nabeel. Thanks for your kind comment. Kinect v2 only provides an RGB stream of 1920×1080. You cannot change that, since there are no streaming settings available. However, you can shrink the size of the frame after you acquire it. The fastest way to do this is by downsizing the frame by 1/2 or 1/3. You’ll need to load the byte array and get rid of half of it’s contents. There is such a method included in Vitruvius.

  • Hyeoni says:

    Hello Vangos! Thanks for your post
    I have a question
    I would like to implement the function to record color frames using kinect v2.
    Can you give me some good advice?
    And, There is one problem I have now
    I want to crop the color frame image displayed on the screen.
    I want to cut the angle of view.
    Could I get some advice on this?
    Thank you a lot~

    • Hello. To record color frames, store the raw data in binary format frame-by-frame. When you stop the recording process, convert the data to JPG.

      To crop the field of view, you can either edit the raw data array and remove columns, or use a WPF Image control with the Stretched property set to UniformToFill.

  • Raja Taimoor says:

    How to convert that ImageSource image to bitmap?

  • Tharaka says:

    Great article.Thanks for sharing . Are there any libraries to calculate velocity of skeltons.

  • fred says:

    Hey Vango !
    i follow a while now through your tutorials, but i still have a preoccupation namely how to detect a flat plane e.g. a table using depth-sensor ?
    Thank you in advance

  • Pergole says:

    Nice Vangos pternesa Live Good.

  • szh says:

    hi Vangos.I saw an example of muti Kinect skeloton fusion.I thought its you.I’m doing this research, but I don’t have a clue. Can you share the code?thank you.
    https://sites.hevra.haifa.ac.il/rbd/2015/10/19/multiple-kinect-cameras-skeleton-fusion/?lang=en

    • Hello. This is not me – the website has just copied one of my pictures. You’ll need to contact the owners regarding that software.

      From my experience, to run multiple Kinect sensors, you’ll need one computer per sensor. The computers will need to communicate using sockets. The data should then be collected and processed by the most powerful PC.

  • Herman says:

    What’s up, its good article concerning media print, we all
    understand media is a fantastic source of facts.

  • Soulfrank says:

    how come when you go to compile me so many errors?

  • Davito says:

    I took my time to read your blog and it’s a win.

    I’m using kinect 2, I have the RGB image and the Depth image saved every 30 minutes. What I want to know is how can I calculate the distance between the objects seen by the kinect sensor to the it?

    Like I have my kinect suspended on a bar where its sensors is pointed to a table. How can I get the distance between the kinect sensors to the objects on the table?

    Thanks in advance,

    • Thanks for your comment! The Kinect Depth frame is, actually, an array of distances. So, every XY depth point represents the corresponding distance (in millimeters) from the sensor. You can also refer to this post for more information.

    • Pergole says:

      Like I have my kinect suspended on a bar where its sensors is pointed to a table. How can I get the distance between the kinect sensors to the objects on the table?

  • lorentlabs says:

    

Thank you, beatiful blog and content. best regards brothers.

  • cz kimdir says:

    

Thank you, beatiful blog and content. best regards brothers.

  • Tente says:

    Thank you, beatiful blog and content. best regards brothers.

Leave a Reply to Alexander Cancel Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.