How to use Kinect HD Face

Kinect HD Face Cover

Throughout my previous article, I demonstrated how you can access the 2D positions of the eyes, nose, and mouth, using Microsoft’s Kinect Face API. The Face API provides us with some basic, yet impressive, functionality: we can detect the X and Y coordinates of 4 eye points and identify a few facial expressions using just a few lines of C# code. This is pretty cool for basic applications, like Augmented Reality games, but what if you need more advanced functionality from your app?

Recently, we decided to extend our Kinetisense project with advanced facial capabilities. More specifically, we needed to access more facial points, including lips, jaw and cheeks. Moreover, we needed the X, Y and Z position of each point in the 3D space. Kinect Face API could not help us, since it was very limited for our scope of work.

Thankfully, Microsoft has implemented a second Face API within the latest Kinect SDK v2. This API is called HD Face and is designed to blow your mind!

At the time of writing, HD Face is the most advanced face tracking library out there. Not only does it detect the human face, but it also allows you to access over 1,000 facial points in the 3D space. Real-time. Within a few milliseconds. Not convinced? I developed a basic program that displays all of these points. Creepy, huh?!

In this article, I am going to show you how to access all these points and display them on a canvas. I’ll also show you how to use Kinect HD Face efficiently and get the most out of it.

Prerequisites

Source Code

Tutorial

Although Kinect HD Face is truly powerful, you’ll notice that it’s badly documented, too. Insufficient documentation makes it hard to understand what’s going on inside the API. Actually, this is because HD Face is supposed to provide advanced, low-level functionality. It gives us access to raw facial data. We, the developers, are responsible to properly interpret the data and use them in our applications. Let me guide you through the whole process.

Step 1: Create a new project

Let’s start by creating a new project. Launch Visual Studio and select File -> New Project. Select C# as you programming language and choose either the WPF or the Windows Store app template. Give your project a name and start coding.

Step 2: Import the required assemblies

To use Kinect HD Face, we need to import 2 assemblies: Microsoft.Kinect.dll and Microsoft.Kinect.Face.dll. Right click your project name and select “Add Reference”. Navigate to the Extensions tab and select those assemblies.

If you are using WinRT, Microsoft.Kinect is called WindowsPreview.Kinect.

Step 3: XAML

The user interface is pretty simple. Open your MainWindow.xaml or MainPage.xaml file and place a drawing canvas within your grid. Preferably, you should add the canvas within a Viewbox element. The Viewbox element will let your Canvas scale proportionally as the window size changes. No additional effort from your side.

<Viewbox Grid.Row="1">
      <Canvas Name="canvas" Width="512" Height="424" />
</Viewbox>

Step 4: Declare the Kinect HD Face objects

After typing the XAML code, open the corresponding C# file (MainWindow.xaml.cs or MainPage.xaml.cs) and import the Kinect namespaces.

For .NET 4.5, import the following:

using Microsoft.Kinect;
using Microsoft.Kinect.Face;

For WinRT, import the following:

using WindowsPreview.Kinect;
using Microsoft.Kinect.Face;

So far, so good. Now, let’s declare the required objects. Like Kinect Face Basics, we need to define the proper body source, body reader, HD face source, and HD face reader:

// Provides a Kinect sensor reference.
private KinectSensor _sensor = null;

// Acquires body frame data.
private BodyFrameSource _bodySource = null;

// Reads body frame data.
private BodyFrameReader _bodyReader = null;

// Acquires HD face data.
private HighDefinitionFaceFrameSource _faceSource = null;

// Reads HD face data.
private HighDefinitionFaceFrameReader _faceReader = null;

// Required to access the face vertices.
private FaceAlignment _faceAlignment = null;

// Required to access the face model points.
private FaceModel _faceModel = null;

// Used to display 1,000 points on screen.
private List<Ellipse> _points = new List<Ellipse>();

Step 5: Initialize Kinect and body/face sources

As usual, we’ll first need to initialize the Kinect sensor, as well as the frame readers. HD Face works just like any ordinary frame: we need a face source and a face reader. The face reader is initialized using the face source. The reason we need a Body source/reader is that each face corresponds to a specific body. You can’t track a face without tracking its body first. The FrameArrived event will fire whenever the sensor has new face data to give us.

_sensor = KinectSensor.GetDefault();

if (_sensor != null)
{
	// Listen for body data.
	_bodySource = _sensor.BodyFrameSource;
	_bodyReader = _bodySource.OpenReader();
	_bodyReader.FrameArrived += BodyReader_FrameArrived;

	// Listen for HD face data.
	_faceSource = new HighDefinitionFaceFrameSource(_sensor);
	_faceReader = _faceSource.OpenReader();
	_faceReader.FrameArrived += FaceReader_FrameArrived;

	_faceModel = new FaceModel();
	_faceAlignment = new FaceAlignment();
        
	// Start tracking!        
	_sensor.Open();
}

Step 6: Connect a body with a face

The next step is a little tricky. This is how we connect a body to a face. How do we do this? Simply by setting the TrackingId property of the Face source. The TrackingId is the same to TrackingId of the body.

private void BodyReader_FrameArrived(object sender, BodyFrameArrivedEventArgs e)
{
    using (var frame = e.FrameReference.AcquireFrame())
    {
        if (frame != null)
        {
            Body[] bodies = new Body[frame.BodyCount];
            frame.GetAndRefreshBodyData(bodies);
            
            Body body = bodies.Where(b => b.IsTracked).FirstOrDefault();
            
            if (!_faceSource.IsTrackingIdValid)
            {
                if (body != null)
                {
                    _faceSource.TrackingId = body.TrackingId;
                }
            }
        }
    }
}

So, we have connected a face with a body. Let’s access the face points now.

Step 7: Get and update the facial points!

Dive into the FaceReader_FrameArrived event handler. We need to check for two conditions before accessing any data. First we, need to ensure that the frame is not null. Secondly, we ensure that the frame has at least one tracked face. Ensuring these conditions, we can call the GetAndRefreshFaceAlignmentResult method, which updates the facial points and properties.

The facial points are given as an array of vertices. A vertex is a 3D point (with X, Y, and Z coordinates) that describes the corner of a geometric triangle. We can use vertices to construct a 3D mesh of the face. For the sake of simplicity, we’ll simply draw the X-Y-Z coordinates. Microsoft’s SDK Browser contains a 3D mesh of the face you can experiment with.

private void FaceReader_FrameArrived(object sender, HighDefinitionFaceFrameArrivedEventArgs e)
{
    using (var frame = e.FrameReference.AcquireFrame())
    {
        if (frame != null && frame.IsFaceTracked)
        {
            frame.GetAndRefreshFaceAlignmentResult(_faceAlignment);
            UpdateFacePoints();
        }
    }
}

private void UpdateFacePoints()
{
    if (_faceModel == null) return;
    
    var vertices = _faceModel.CalculateVerticesForAlignment(_faceAlignment);
}

As you can see, the vertices is a list of CameraSpacePoint. The CameraSpacePoint is a Kinect-specific structure that contains information about a 3D point.

Hint: we have already used CameraSpacePoints when we performed body tracking.

Step 8: Draw the points on screen

And now, the fun part: we have a list of CameraSpacePoint objects and a list of Ellipse objects. We’ll add the ellipses within the canvas and we’ll specify their exact X & Y position.

Caution: The X, Y, and Z coordinates are measured in meters! To properly find the corresponding pixel values, we’ll use Coordinate Mapper. Coordinate Mapper is a built-in mechanism that converts between 3D space positions to 2D screen positions.

private void UpdateFacePoints()
{
    if (_faceModel == null) return;
    
    var vertices = _faceModel.CalculateVerticesForAlignment(_faceAlignment);
    
    if (vertices.Count > 0)
    {
        if (_points.Count == 0)
        {
            for (int index = 0; index < vertices.Count; index++)
            {
                Ellipse ellipse = new Ellipse
                {
                    Width = 2.0,
                    Height = 2.0,
                    Fill = new SolidColorBrush(Colors.Blue)
                };
                
                _points.Add(ellipse);
            }
            
            foreach (Ellipse ellipse in _points)
            {
                canvas.Children.Add(ellipse);
            }
        }
        
        for (int index = 0; index < vertices.Count; index++)
        {
            CameraSpacePoint vertice = vertices[index];
            DepthSpacePoint point = _sensor.CoordinateMapper.MapCameraPointToDepthSpace(vertice);
            
            if (float.IsInfinity(point.X) || float.IsInfinity(point.Y)) return;
            
            Ellipse ellipse = _points[index];
            
            Canvas.SetLeft(ellipse, point.X);
            Canvas.SetTop(ellipse, point.Y);
        }
    }
}

That’s it. Build the application and run it. Stand between 0.5 and 2 meters from the sensor. Here’s the result:

Kinect HD Face 1

Common facial points

OK, we drew the points on screen. So what? Is there a way to actually understand what each point is? How can we identify where they eyes are? How can we detect the jaw? The API has no built-in mechanism to get a human-friendly representation of all of the face data. We need to handle over 1,000 points in the 3D space manually!

Don’t worry, though. Each one of the vertices has a specific index number. Knowing the index number, you can easily deduce where does it correspond to. For example, the vertex numbers 1086, 820, 824, 840, 847, 850, 807, 782, and 755 belong to the left eyebrow.

Microsoft includes some commonly-used points as part of the SDK (thanks to Reini Adovics for noting this to me). These points are exposed by the HighDetailFacePoints enumeration:

Key Index
HighDetailFacePoints_LefteyeInnercorner 210
HighDetailFacePoints_LefteyeOutercorner 469
HighDetailFacePoints_LefteyeMidtop 241
HighDetailFacePoints_LefteyeMidbottom 1104
HighDetailFacePoints_RighteyeInnercorner 843
HighDetailFacePoints_RighteyeOutercorner 1117
HighDetailFacePoints_RighteyeMidtop 731
HighDetailFacePoints_RighteyeMidbottom 1090
HighDetailFacePoints_LefteyebrowInner 346
HighDetailFacePoints_LefteyebrowOuter 140
HighDetailFacePoints_LefteyebrowCenter 222
HighDetailFacePoints_RighteyebrowInner 803
HighDetailFacePoints_RighteyebrowOuter 758
HighDetailFacePoints_RighteyebrowCenter 849
HighDetailFacePoints_MouthLeftcorner 91
HighDetailFacePoints_MouthRightcorner 687
HighDetailFacePoints_MouthUpperlipMidtop 19
HighDetailFacePoints_MouthUpperlipMidbottom 1072
HighDetailFacePoints_MouthLowerlipMidtop 10
HighDetailFacePoints_MouthLowerlipMidbottom 8
HighDetailFacePoints_NoseTip 18
HighDetailFacePoints_NoseBottom 14
HighDetailFacePoints_NoseBottomleft 156
HighDetailFacePoints_NoseBottomright 783
HighDetailFacePoints_NoseTop 24
HighDetailFacePoints_NoseTopleft 151
HighDetailFacePoints_NoseTopright 772
HighDetailFacePoints_ForeheadCenter 28
HighDetailFacePoints_LeftcheekCenter 412
HighDetailFacePoints_RightcheekCenter 933
HighDetailFacePoints_Leftcheekbone 458
HighDetailFacePoints_Rightcheekbone 674
HighDetailFacePoints_ChinCenter 4
HighDetailFacePoints_LowerjawLeftend 1307
HighDetailFacePoints_LowerjawRightend 1327

If you wish, you can use the Color, Depth, or Infrared bitmap generator and display the camera view behind the face. Keep in mind that simultaneous bitmap and face rendering may cause performance issues in your application. So, handle with care and do not over-use your resources.

Kinect HD Face 2

 

Source Code

PS: I’ve been quite silent during the past few months. It was not my intention and I really apologize for that. My team was busy developing the Orthosense app for Intel’s International Competition. We won the GRAND PRIZE and we were featured on USA Today. From now on, I promise I’ll be more active in the Kinect community. Please keep sending me your comments and emails.

Till the next time, enjoy Kinecting!

PS: Vitruvius

If you enjoyed this article, then you’ll love Vitruvius. Vitruvius is a set of powerful Kinect extensions that will help you build stunning Kinect apps in minutes. Vitruvius includes avateering, HD Face, background removal, angle calculations, and more. Check it now.

Want to hire my team for your next cutting-edge fitness app? Drop me an email.

Author Vangos Pterneas

Vangos Pterneas is an award-winning Microsoft Most Valuable Professional. He is helping companies from all over the world grow their revenue by creating profitable software products. He loves Motion Technology and Mixed Reality. Vangos is the CEO of LightBuzz Inc and author of The Dark Art Of Freelancing.

More posts by Vangos Pterneas

Join the discussion 89 Comments

  • Amit Vashisht says:

    Can u please tell me how to access the various index numbers of the vertices? like i want to know which vertices correspond to the mouth and what are their indices.

    • Hi Amit,

      Here’s what you can do find the index number of a vertex:

      1. When you create the ellipse objects, provide the index number as a Tag to each ellipse.
      2. For each ellipse, subscribe to the Tapped event (WinRT) or the MouseLeftButtonUp event (WPF).
      3. When the event fires (inside the event handler), log the Tag of the clicked ellipse (Debug.WriteLine). It should be the index of the vertex.

      Let me know if you need further help with this.

  • Georgios says:

    Excellent guide Vangeli! Thank you.
    What about profile views of faces? Is this robust? Does the system return less facial points?

    • Hi Georgios. Thank you very much for your kind words. HD Face is accurate when you are facing the sensor (frontal view). It remains accurate for approximately 45-60 degrees of head rotation. Expect a lot of false positives when you rotate your head for more than 60 degrees.

  • Lynn says:

    Hi there! Thank you for this great tutorial! Would like to ask if you’d know how to use these vertices information to map a picture of another face, say a Celebrity’s facial features, on to one’s face that’s seen from the Kinect. Thank you very much!

  • kroko says:

    have you found (or maybe mapped yourself) vertex numbering for all the HD face points? i just cannot find any docs/”nice picture” on this. ms documetation for sdk 1.8 does have numbering shown https://i-msdn.sec.s-msft.com/dynimg/IC584330.png, hope something like this existed for v2 HD. thanks!

    • Hi Kroko. This is something I’m working on. There is no official documentation about it.

      • kroko says:

        mkay, i’ll try to generate a map then

      • kroko says:

        quick and dirty http://imgur.com/a/fs3lb. i was only interested in eye corners, thus these closeups, but hope this comes handy. i am using kinect common bridge though, but i doubt the indexes are different from working with “raw SDK” (kind of defeats the purpose of the bridge, right 🙂 )

        • This is awesome. Thanks for sharing!

          • kroko says:

            enum _HighDetailFacePoints
            {
            HighDetailFacePoints_LefteyeInnercorner = 210,
            HighDetailFacePoints_LefteyeOutercorner = 469,
            HighDetailFacePoints_LefteyeMidtop = 241,
            HighDetailFacePoints_LefteyeMidbottom = 1104,
            HighDetailFacePoints_RighteyeInnercorner = 843,
            HighDetailFacePoints_RighteyeOutercorner = 1117,
            HighDetailFacePoints_RighteyeMidtop = 731,
            HighDetailFacePoints_RighteyeMidbottom = 1090,
            HighDetailFacePoints_LefteyebrowInner = 346,
            HighDetailFacePoints_LefteyebrowOuter = 140,
            HighDetailFacePoints_LefteyebrowCenter = 222,
            HighDetailFacePoints_RighteyebrowInner = 803,
            HighDetailFacePoints_RighteyebrowOuter = 758,
            HighDetailFacePoints_RighteyebrowCenter = 849,
            HighDetailFacePoints_MouthLeftcorner = 91,
            HighDetailFacePoints_MouthRightcorner = 687,
            HighDetailFacePoints_MouthUpperlipMidtop = 19,
            HighDetailFacePoints_MouthUpperlipMidbottom = 1072,
            HighDetailFacePoints_MouthLowerlipMidtop = 10,
            HighDetailFacePoints_MouthLowerlipMidbottom = 8,
            HighDetailFacePoints_NoseTip = 18,
            HighDetailFacePoints_NoseBottom = 14,
            HighDetailFacePoints_NoseBottomleft = 156,
            HighDetailFacePoints_NoseBottomright = 783,
            HighDetailFacePoints_NoseTop = 24,
            HighDetailFacePoints_NoseTopleft = 151,
            HighDetailFacePoints_NoseTopright = 772,
            HighDetailFacePoints_ForeheadCenter = 28,
            HighDetailFacePoints_LeftcheekCenter = 412,
            HighDetailFacePoints_RightcheekCenter = 933,
            HighDetailFacePoints_Leftcheekbone = 458,
            HighDetailFacePoints_Rightcheekbone = 674,
            HighDetailFacePoints_ChinCenter = 4,
            HighDetailFacePoints_LowerjawLeftend = 1307,
            HighDetailFacePoints_LowerjawRightend = 1327
            } ;

          • This is amazingly useful. I am going to update the blog post and give credits to you.

          • kroko says:

            i’m just going through SDK and this enum is there. assuming the standard install path for SDK you can find it in header c:\Program Files\Microsoft SDKs\Kinect\v2.0_1409\inc\Kinect.Face.h

      • Juandi says:

        I found this https://social.msdn.microsoft.com/Forums/getfile/668131 in the KinectV2 forum of Microsoft

  • Juandi says:

    Hi Vangos,
    First of all, congratulations for the website.
    I’m working in my final degree proyect with Kinect V2 and I have a doubt that I hope you would be able to solve.
    How can I extract a photo in bmp of the detected face? In other words, how can I extract the point cloud of the tracking face?

    Thanks and sorry for my bad english.

    • Hi Juandi. That would be feasible using HD Face. First of all, you need to detect the forehead, the cheeks and the jaw:

      var forehead = vertices[HighDetailFacePoints.ForeheadCenter];
      var cheekLeft = vertices[HighDetailFacePoints.LeftcheekCenter];
      var cheekRight = vertices[HighDetailFacePoints.RightcheekCenter];
      var jaw = vertices[HighDetailFacePoints.LowerjawLeftend];

      You now have the 3D coordinates of 4 points. Each point is a CameraSpacePoint in the world space. Using Coordinate Mapper, you can map the world space to the RGB or depth/infrared space and find the 2D coordinates in a 1920×1080 or 512×424 frame.

      Since you have 4 points in the frame, you can crop the generated bitmap and extract the face.

  • Benjamin Biggs says:

    Hi Vangos. Your articles are amazing and to be honest, I don’t know where I would be without you!

    I have an interesting problem in which I am trying to detect if a person has a beard and a particular kind of glasses. Firstly, my plan with the beard is to use the HD face data in combination with the colour data and run the result through some sort of external imaging package (likely Emgu CV which is a wrapper for OpenCV). For the glasses, a similar trick will be used. The FaceFrameFeatures enumeration seems to only yield the presence of glasses and not whether they are the correct ones.

    Firstly, I wonder if you have any source code which involves overlaying the face data to the colour frames. This seems essential to solving my problem and I’m not sure whether the traditional coordinate mapper example can help me. By the look of your final picture, you may have something handy that I could have a peek at.

    Secondly, I’d really appreciate it if you could give me some general advice on the aforementioned strategy to beard (and particular kind of glasses) detection. A word of advice from you would be excellent!

    Finally, I wonder if you have any plans to release a book on Kinect SDK v2. I have really struggled in particular with Kinect Fusion stuff and have found tutorials hard to come by. I wonder if you could either point me to a tutorial/resource/extra source code (I’ve already seen the example that comes with the SDK) or failing that, write a book that I can buy for any price! 🙂

    Thanks again. I love you.

  • E B says:

    Do you know if it is possible to modify the code that is used to identify a face? I am trying to track infants (6-9 months) and Kinect does not pick up their faces. I suspect that this is because their features have different proportions than adults. I would like to try to modify the code to identify infant faces. Thanks!

    • Hi E B. We cannot modify the face tracking code. It is part of the SDK. Have you verified that the body of an infant is tracked? if the infant is lying on a bed, Kinect will not easily recognize it. You can put the sensor on the ceiling for better results.

      • E B says:

        Thank you for the quick reply! I have not verified that. The infant is sitting and moving his arms and head. I know that for adults, tracking the body while sitting is more difficult than tracking it while standing, but not particularly problematic. I’ll see if the body is tracked and hopefully that will make the head track as well.

        • Indeed. Face tracking relies on Body tracking! Verify that the body is tracked and your job will be much easier. Simply run the Body-Basics example from the Kinect SDK Browser.

  • E B says:

    Body tracking doesn’t work for sitting infants. I’ve tried positioning the Kinect in different ways, but it won’t consistently pick them up.

    I have a some LED markers that I can easily and safely attach to the infants’ heads and shoulders (the only body parts I’m interested in). Do you know of any tutorial or code for tracking LED markers with the Kinect?

    Thanks again!

  • JA says:

    Hi Bro, I have a problem in this line:
    faceSource = new HighDefinitionFaceFrameSource(KinectSensor.GetDefault());
    The mistake says this:
    An unhandled exception of type ‘System.InvalidOperationException’ in Microsoft.Kinect.Face.dll but not handled in user code
    Note: This API has returned an exception from an HRESULT: 0x80070002.

    You can help me with this please.

    • Hi JA. Is it a WPF or Windows Store app? In case you are using WPF, you need to add the following line under Project → Properties → Build Events → Post-build event command line. This command will import some necessary configuration files.

      xcopy “C:\Program Files (x86)\Microsoft SDKs\Windows\v8.0\ExtensionSDKs\Microsoft.Kinect.Face\2.0\Redist\CommonConfiguration\x64\NuiDatabase” “NuiDatabase” /e /y /i /r

      • JA says:

        Thanks it is runing :),
        now I’m trying to put together the color Basic with this but when running becomes very slow you have any idea why?

        Regards 🙂

        • Yes. The color frame is really huge (1920×1080). As a result, trying to display so many Color frames and handle so many Face points is a demanding job.

          You have two options:
          1) Display the Depth frame instead of the color frame.
          2) Display the Color frame with fewer Face points (you probably do not need all of the 1,000 points).

          Hope that information helps you.

  • Sam says:

    Thank you for sharing ,sir! Here I want to ask you a problem.
    If I want to put a obj glass model draw by opengl on the kinect face,how can I put the right location? That is, how can the obj model tracking the facepoint?

    Best.

  • Evan Silverstein says:

    Hi there. First, let me say thank you so much for your posts on the Kinect. They have helped me immensely in my thesis project.

    My question to you involves actual face recognition. I would like to track a face, save data obtained from the face, and be able to recall that data to make a comparison to a new face when the program is turned on again. Most of what I just described are things that I have worked out (save multiple values into a csv file, import them back into the program, and compare). However, the hard part is simply what values to use that give the appropriate details for the face. The only values I can find come from the CalculateVerticesForAlignment method. I can extract all 1347 points from those vertices but, when I compare them to multiple tries of my own face (or to someone else’s face), the values I get just don’t match up. Specifically, I am calculating the 3D vectors from 1 specific point to another and then comparing that same vector to see if the values is similar. This does not appear to work as well as I hoped as the values are never close enough to make a good determination.

    Can you suggest another way to extract the numerical data for a face that would allow me to compare it to the same face or different faces for actual facial recall/recognition?

    Thanks in advance for any help you can give.

    Evan

    • Hi Evan. Do not compare vertices. Instead, compare the relative distances between specific points. For example, you can compare the distance between the eyes, the jaw-forehead, etc. Also, remember to take the measurements at a specific angle. It will be much more accurate if you compare measurements between 10 degrees of rotation than 40 degrees of rotation.

      • Evan Silverstein says:

        Thank you for responding so quickly. I guess I should clarify though: I am looking at the relative distances between vertices by calculating the 3D vector length (SQRT((x2-x1)^2 + (y2-y1)^2 +(z2-z1)^2) between two points (say tip of the nose to the left eye) and then comparing that length to the length of the same vector for a different face. The results, as I mentioned, were not accurate. Are you saying there are other values I should be getting?

        Also, those vertex values are very very small and are often negative. Do you know what point they are being measured from (i.e. What the 0,0,0 point is)?

        Lastly, I agree with your point in making sure the face is at the same rotation for each capture (hopefully 0). Is there a method that will give me the approximate head rotation angle so I can know?

        Thanks again for any help you can provide.

        Evan

  • Faiz says:

    Hi Vangos,
    I already tried this programme as per your recommendation. As I already assembled the code without error, unfortunately the program ran but nothing coming on the UI canvas. Just Kinect 2 Face HD text. Can you help me on this one?
    Thank you.

    • Hi Faiz. If you are using WPF, you’ll need to type the following command in the Post-Build events section:

      xcopy “C:\Program Files (x86)\Microsoft SDKs\Windows\v8.0\ExtensionSDKs\Microsoft.Kinect.Face\2.0\Redist\CommonConfiguration\x64\NuiDatabase” “NuiDatabase” /e /y /i /r

  • Hoa Le says:

    Hi Vangos,

    Thanks for your post. I’m doing a project using Kinect HD Face Tracking (C#). I need to extract the face features of a tracked face to a text or excel file to analyze in a data mining tool. Can you suggest a way to extract the animation units value and 3D Head Poses for a tracked face in C#?
    Thanks again.

  • Filip Dextré says:

    First of all my compliments with the clear and nice blog.

    Seeing the popularity of like snapchat effects; this HD-face technology comes potentially into “our scope of business”.
    I was just wondering -from your point of view- how the HD-face can be mapped on a relatively easy way to existing effects (like the google hangout toolbox) or even better existing 3D models. Everybody has seen the pepsi halloween promo – https://www.youtube.com/watch?v=3GG2wKZw3wg. I do realise this as an promo video by an agency, and even if this technology would make a close match, a project like this is a custom (and expensive) job.

    I just wonder if you see any possibilities to find a library of effects, similar like your approach to the 3D models & kinect2.

    • Hi Filip. Thanks for your comment. It would need some work, but it would be feasible using Kinect v2.

    • Hoa Le says:

      Hi Vangos,

      I tried to modify your Kinect-2-CSV tool to extract the animation units but I’m confused what should be replaced Body class and joins for the AUs of HD FT. Please advise:

      public void Update(HighDefinitionFaceFrame hdff)
      {
      if (!IsRecording) return;
      if (hdff == null || !hdff.IsFaceTracked) return;

      string path = Path.Combine(Folder, _current.ToString() + “.line”);

      using (StreamWriter writer = new StreamWriter(path))
      {
      StringBuilder line = new StringBuilder();

      if (!_hasEnumeratedJoints)
      {
      foreach (var joint in hdff?)
      {
      line.Append(string.Format(“{0};;;”, joint.JointType.ToString()));
      }
      line.AppendLine();

      foreach (var joint in body.Joints.Values)
      {
      line.Append(“X;Y;Z;”);
      }
      line.AppendLine();

      _hasEnumeratedJoints = true;
      }

      foreach (var joint in body.Joints.Values)
      {
      line.Append(string.Format(“{0};{1};{2};”, joint.Position.X, joint.Position.Y, joint.Position.Z));
      }

      writer.Write(line);

      _current++;
      }
      }

      • Hello. Thanks for your comment. There are no Face joints or equivalents. All you can do is use Vertices[some_index] to get the X-Y-Z values and store them in your CSV file.

  • Sergio says:

    Vangos,
    Great articles, they are helping me a lot as I am a beginner BUT I’ve run into something I don’t see any information on.
    If Kinect v2.0 can only track 2 HD faces at a time, how can we tell it which face we want to track if there are more than two people in the visible frame?

    I’ll await your advice on this as I am stuck.

    Thank you!
    Sergio

    • Hi Sergio. That’s a great question. You specify the Body a Face belongs to by assigning the proper tracking ID:

      _faceSource.TrackingId = body.TrackingId;

      This way, you know exactly who the face belongs to!

  • Pete says:

    Hey Vangos

    Any opinions on the accuracy of the face tracking ?

    I’m mostly interested in the change in size and shape of features over time. For example, jaw width, width of the bridge of the nose, stuff like that. Have some medical research in the back of my mind. I don’t need to see the actual size in cm or similar, I’m more interested in the ratios of facial vertices I capture now and facial vertices I capture in 6 months time. The changes would be pretty small though so if I can capture the same face 20 frames in a row and get 20 different results, it’s not going to work out too well.

    Cheers

    Pete

    • Hi Pete. Thanks for your comment – it’s very interesting. You can achieve what you are describing, but you’ll need a lot of testing and a lot of captured frames. Probably a classifier, too. Kinect is accurate, but you’ll get a lot of “false positives” and a lot of “jumpy” values. So, you’ll also need to smooth the data before comparing.

      • Pete says:

        Thanks. On a somewhat related note, any plans to allow Kinetisense to hook up to EMR/EHRs with APIs ? I’m a consultant that does a lot of work for a large Orthopaedic practice that’s very interested in outcome data.

      • Thanasis says:

        Hi Vangos, Do you think that is possible to smooth face data in order to get smoother Animation Units?

        • Hi Thanasis. Yes, definitely. To smooth face data, you have to choose a window size (let’s say 10 or 20 frames) and take the average of the positions of a specific Face point.

          • Thanasis says:

            Thank you for your response. Averaging frames might solve face flickering. But im still now sure how to use these new smoother positions to acquire smoother AnimationUnits (FaceAlignment.AnimationUnits).

          • Hi Thanasis. Smoothing the AnimationUnits would mean to eliminate any non-matching units within a sequence of units.

            Alternatively, you could use the float value of the AnimationUnits dictionary to detect the weight of an expression and reject the ones that are below a predefined threshold.

          • Evan Silverstein says:

            Hey Vangos. Can you elaborate a little bit on what you mean by this? Are you saying to simply look at the AUs per frame and eliminate any that aren’t within an expected threshold? I am working with the HD face process as well and have been trying to understand the Animation Units and what exactly they do so any further insight would be appreciated.

            Also, I’m not sure you have delved into the HD Face Basics-WPF process, but that process includes 94 Shape Units (SUs) that become static once the process has been completed. Do you happen to know how those SUs manipulate the vertices at all? I can’t find any documentation that states exactly what they do, just that they modify the vertices obtained in each frame.

          • Hi Evan. Yes, that’s what I mean. The weight is just a float number.

  • Petter says:

    Hello , I am trying to run the program , however only appears blank window , does not appear with the video capture. Can you help me ?

    • Hi Petter. Can you add a breakpoint to see if you are getting Kinect data?

    • Thanasis says:

      Hi Vangos. As far as i know, animation units are distances related to facial movement (facial point movement). Thus, having smoother facial points could lead to smoother Animation Units (whithout averaging or thresholding them). But what im missing here is the way to pass these new face points to the hd face model. Hope i made myself clear. Thnks

  • Sam says:

    Hi Vangos,
    I need your advice on identifying a face. For example, if one of my family member stand in front of the kinect, i want to identify the person and show profile information. (name, address , age etc).
    What sort of face points i need to store in the database for later match.
    Please help how i should approach this?

    Thanks,
    Sam

  • Tony says:

    Hi Vangos:
    Thx for ur amazing post, it works well !!!
    Have u ever studied the High definition face tracking Animation Units ? the Microsoft Kinect programming guide describe it as following:

    “The API tracks 17 animation units (AUs). Most of the AUs are expressed as a numeric weight varying between 0 and 1. Three of them, Jaw Slide Right, Right Eyebrow Lowerer, and Left Eyebrow Lowerer, vary between -1 and +1.”

    the relevant link : https://msdn.microsoft.com/en-us/library/microsoft.kinect.face.faceshapeanimations.aspx

    I wanna use the Animation Units coefficients, but I do not find the relevant APIs. The Kinect 1 SDK 1.8 provides GetAnimationUnitCoefficients() function that could easily make me to get the coefficients, but how to do it in Kinect 2 SDK 2.0. BTW, I use WPF(C#)

    thank you for ur help !!!

    • Hi Tony. You can install the HD Face Basics demo and check the FaceAlignment field. The FaceAlignment includes the Animation Units.

      var value = currentFaceAlignment.AnimationUnits[FaceShapeAnimations.JawOpen];

    • http://www./ says:

      Youtube should keep the 'switch' as a permanent feature some people prefer the black background and it is environmentally friendly but instead of turning it black then going to another video and it goes back to white you should be able to turn it off and be able to change the video as well and it should be on every page not just video's channels, your subs page, homepage and every thing else k just a thought.

  • Hannan says:

    Hello Mr.Pterneas;

    I do appreciate your reply and help for me.

    I need to develop a simple application with simple interface to know the following points:
    1- movement and direction of head
    2- movement of arm (up or down)
    3- hand if its near to mouth or not

    what is your for me as an expert developer with Kinect …what the basic codes and points can I depend on to start and to create my interface for the app??

    • Hello Hannan. You should measure the distance between the hand joint and the mouth.

      To find whether the arm is up or down, simply compare the following:

      var handY = body.Joints[JointType.HandRight].Position.Y;
      var elbowY = body.Joints[JointType.ElbowRight].Position.Y;
      if (handY > elbowY) { /* Down */ }
      if (handY < elbowY) { /* Up */ }

      To calculate the distance between the hand and the mouth, the easiest way is using Vitruvius:

      var mouth = face.Mouth;
      var hand = body.Joints[JointType.HandRight].Position;
      var distance = mouth.Length(hand);
      if (distance < 0.1) { /* Hand is close to mouth */ }

  • Vanlal says:

    Hello Vangos.

    Thanks for all your articles on the Kinect.

    I have seen that different Kinect sensors have slight differences in position when overlaying the HD face generated mesh over the color feed. Have you ever seen this yourself or anywhere else?

    To explain, I have an application where I apply facial makeup texture on the 3D HD mesh and overlay it on the user’s face on the color feed. So the overlay needs to be very accurate. I achieve this accuracy somewhat by correcting the parallax dynamically between the color and depth spaces, and shifting the x-y position of the color image by some hard-coded values. While I can achieve good accuracy for one sensor this way, the xy shift goes off if I plug in a different sensor. The parallax fix still works accurately so I have to adjust the xy offset for each Kinect sensor.

    My conclusion from all this is that the physical placement of the color and IR cameras is not standard (not accurate enough) for all Kinect sensors. What do you think?

    PS: The sensors I use are XBox One sensors made in China

    • Hi Vanlal. Thank you very much for your comment. I have not experienced such an inconsistency with any of my Kinect sensors. I have checked over 15 sensors and there are no inconsistencies. Check whether your code considers any fixed distances that could change based on the position of the sensor or the height of a user.

      • Vanlal says:

        My code does take care of those parameters – the user can move around the whole view area in all three lateral dimensions without much inaccuracy. Since the differences between the sensors are very consistent, we have decided to adjust for each sensor. Thanks for taking the time to check it out.

    • Hannan says:

      I am so thankful ..you did help me a lot.

      1- I always got confuse about y coordinate and x coordinate..how I can know it?? for example, in these statement what the Y …I don’t understood this:

      if (handY > elbowY) { /* Down */ }
      if (handY < elbowY) { /* Up */ }

      2- here you used the function length ( var distance = mouth.Length(hand); )
      Is there any documentation for the functions of Kinect2 can I use ??? from where can I know kinds of these functions

      Thanks alot

      • Hello Hannan. Here are the answers to your questions:

        1) The complete code would be the following:
        float handX = body.Joints[JointType.HandRight].Position.X;

        2) The Length method is part of Vitruvius. The documentation can be found here.

        • Hannan says:

          I do appreciate all you support for me and I promise you , I will mention your full name in my printed thesis because you are really help me a lot..
          I will graduate next Fall 🙂

          thanks from my heart

  • AR says:

    Hi
    I have a problem in this line:
    faceSource = new HighDefinitionFaceFrameSource(KinectSensor.GetDefault());
    The mistake says this:
    An unhandled exception of type ‘System.InvalidOperationException’ in Microsoft.Kinect.Face.dll but not handled in user code

    This API has returned an exception from an HRESULT: 0x80070002.

    I applied
    xcopy “C:\Program Files (x86)\Microsoft SDKs\Windows\v8.0\ExtensionSDKs\Microsoft.Kinect.Face\2.0\Redist\CommonConfiguration\x64\NuiDatabase” “NuiDatabase” /e /y /i /r

    but the new error is “GenerateResource” task failed unexpectedly !

  • Inam Ur Rehman says:

    Hi Vangos,
    Thanks for your post which is helping me in implementing my application which is using Kinect.
    I am having trouble in mapping these face points on a 2D bitmap image, which I am getting from ColorBasics. I just want to ask that vertices[index] contains three axis (x, y, z) I want to map it in 2D plane (x, y). If I just ignore z-axis, then it is not returning accurate results. What would be a nice method to map these points on 2D plane?

    Thanks.

  • JD says:

    For whatever reason when I build and run the exe I get a blank canvas with no output from the kinect. Any idea what I’m doing wrong?

  • Panagiotis says:

    Hi Vangos, thanks for your tutorial !
    Do you know if using a facemodel builder also increases the accuracy of the vertices ? Or is it only used to access the other properties of the facemodel (like shape deformations) ?

    • Hello Panagiotis. The facemodel builder is helping you access and display the vertices/deformations. The accuracy is not configurable. It’s affected by lighting conditions and distance from the sensor.

  • Elena says:

    I would like you to advise me as I can evaluate if it has opened the mouth or if an eye has closed with the vertices that the function returns.

    Thank you so much.

    • Hello. You can measure the distance between the vertex HighDetailFacePoints_MouthUpperlipMidtop and the vertex HighDetailFacePoints_MouthLowerlipMidbottom. The distance would be low if the mouth is not open. It would be bigger if the mouth his open.

  • Ahmad Raza says:

    Thankyou so much for a great tutorial. Can you please tell me how can i get RGB values of a particular vertices you have mentioned on the face. I want to get the Skin Tone of the face.

    • Hello Ahmad. When projected on the Color frame, each point has a specific X and Y coordinate, within the 1920×1080 Color frame. The colorIndex would be:


      var colorIndex = Y * 1920 + X;

      So, the RGBA color values for that point would be:


      var r = colorData[colorIndex + 0];
      var g = colorData[colorIndex + 1];
      var b = colorData[colorIndex + 2];
      var a = colorData[colorIndex + 3];

  • salahuddin says:

    Will it works for Kinect for XBOX 360 v1 ?

  • Andrew says:

    Thanks so much for this tutorial, it has been very helpful!

    Quick question regarding the CameraSpacePoints. I am interested in snapping the orientation of the face vertices towards the camera, presumably using the FaceAlignment.FaceOrientation property. I have been able to convert the individual vertices (CameraSpacePoints) to Microsoft.Xna.Framework.Vector3 objects, and have transformed them via the inverse of the FaceOrientation rotation matrix without errors. However, as a sanity check, I was hoping to plot them on the canvas like you do in this example. However, the CoordinateMapper.MapCameraPointToDepthSpace only takes in CameraSpacePoints (not Vector3 objects). Just curious, is there a way to essentially recreate this behavior, warp the CameraSpacePoints directly without changing types, or otherwise achieve this goal?

    Thanks again!

Leave a Reply to E B Cancel Reply