Face detection with canvas and webcam video

With the getUser­Me­dia API, a video ele­ment, a can­vas ele­ment and LiuLiu’s excel­lent face detec­tion algo­rithm, we can eas­ily play around with web­cam video data in the browser, plug-in free.

To this end, here are two exper­i­ments which do just that, one which places a mask over your face as you move and another that attempts to scale con­tent based on your dis­tance from the screen. Have a play below.

Mask over­lay experiment

View exper­i­ment | GitHub

Scal­ing con­tent experiment

View exper­i­ment | GitHub

Back­ground

Back in 2009, when 3D trans­forms first appeared on the scene, and when I first toyed with a rotat­ing 3D cube, I had the desire to make that object rotate based on the posi­tion of a viewer. As you look left, the cube might rotate left. At the time I’d seen some work with can­vas pro­cess­ing video frames and detect­ing eye blinks. But I needed seam­less access to the web­cam, and that was only avail­able through Flash.

In 2010, at Full Frontal, Paul Rouget reminded us of the pos­si­bil­i­ties of the web­cam in the browser, but back then I didn’t make much of it. I needed a cus­tom build of Fire­fox and that device API was even­tu­ally deprecated.

Fast for­ward to today and we have the getUser­Me­dia (gUM) API, for access­ing a user’s microphone(s) and webcam(s). This comes as part of the real time com­mu­ni­ca­tions spec, and it’s sup­ported, some­what, in Chrome 21+ and Opera 12+, albeit in slightly dif­fer­ent guises.

Face detec­tion in can­vas has also improved, and we have LiuLiu’s “not-so-slow” face detec­tion scripts (for those inter­ested, the tech­nique in JavaScript is explained on LiuLiu’s blog). What’s more, the ven­er­a­ble Wes Bos used this in video, last year, to great effect. Much of my exper­i­men­ta­tion has been based on this, and I’d urge you to have a read yourself.

Put it all together and what have you got? A web­cam stream dumped into a video ele­ment, processed into a can­vas ele­ment, and processed again to search for faces, in real time, in the browser, with­out plu­g­ins. Huzzah.

Detect­ing the pres­ence and rel­a­tive dis­tance of a face is much sim­pler than the angle a user is look­ing. So for now, rather than rotat­ing, I have set­tled on a sim­ple scale: as you move for­wards or back­wards, the con­tent adapts, tran­si­tion­ing and trans­form­ing as appropriate.

It’s never sim­ple though. The face detec­tion only works some of the time. With busy back­grounds or low light con­di­tions the detec­tion fails more often. Some­times the wrong area is detected, which can lead to rad­i­cal and jar­ring shifts in the scale. Per­haps a rolling aver­age would be a bet­ter indi­ca­tion, alas I haven’t built that.

How to

Below I have dis­sected the key parts of the exper­i­ments. And as always the exper­i­ment code is avail­able on GitHub.

getUser­Me­dia

Presently (Nov 2012) Chrome 21+ and Opera 12+ are the only browsers that sup­port getUser­Me­dia. Some early ver­sions accepted a comma sep­a­rated string of media types, e.g. “video, audio”, later ver­sions use an object instead, {video: true}. In Chrome getUser­Me­dia is name-spaced, as is window.URL, which we need for inter­pret­ing the web­cam stream.

Before we begin it’s best to nor­malise this stuff. This gist and its com­ments were help­ful, as was HTML5 Doctor’s guid­ance.

//normalise window.URL
window.URL || (window.URL = window.webkitURL || window.msURL || window.oURL);

//normalise navigator.getUserMedia
navigator.getUserMedia || (navigator.getUserMedia = navigator.webkitGetUserMedia || navigator.mozGetUserMedia || navigator.msGetUserMedia);

Now let’s call it:

// toString for the older implementation (found by https://github.com/agektmr)
var options = {video: true, toString: function(){ return "video"; }};
navigator.getUserMedia(options, successCallback, errorCallback);

Con­vert­ing a stream to a video element

To show a web­cam stream in a video ele­ment we need only set the video source to the stream returned by getUser­Me­dia in the suc­cess call­back. This is done either directly or using a URL object of that stream:

// using the normalised window.URL
function successCallback(stream) {
    video.src = (window.URL && window.URL.createObjectURL) ? window.URL.createObjectURL(stream) : stream;
}

Trans­plant­ing to canvas

For the video to ren­der within can­vas we need to take the cur­rent video frame and apply it to the can­vas with draw­Im­age. We need to do this as often as pos­si­ble. A timer that calls the same func­tion again after 50ms works well enough.

function drawFrame() {
    var canvas = document.querySelector('canvas'),
        context = canvas.getContext('2d');

    context.drawImage(video, 0, 0, canvas.width, canvas.height);
    setTimeout(drawFrame, 50);
}

Face detec­tion

First we include the won­der­ful CCV library (ccv.js) and another file which defines a face object (face.js). To detect the faces in our can­vas we sim­ply call the detect_objects method and pass in our canvas:

ccv.detect_objects({canvas : (ccv.pre(canvas)), cascade: cascade, interval: 2, min_neighbors: 1});

This gives an array of detected objects, each with x and y co-ordinates, a width and a height. It looks a bit like:

{
    confidence: 0.16752329000000035,
    height: 48.500000000000014,
    neighbors: 1,
    width: 48.500000000000014,
    x: 80.50000000000001,
    y: 104.50000000000003
}

This oper­a­tion is rel­a­tively slow. To speed up the face detec­tion I rec­om­mend using a small can­vas (200x160) and hence a scaled down video frame. This gives the algo­rithm much less data to assess and reduces pro­cess­ing time per frame from ~500ms to a more man­age­able ~100ms. The algo­rithm can also detect mul­ti­ple faces, by default the scal­ing exper­i­ment uses the first found face, and only begins when there is only one face on screen. (The masks work for everyone).

To high­light the face or draw a mask over it:

// highlight
context.fillRect(face.x, face.y, face.width, face.height);

// mask
context.drawImage(mask, face.x, face.y, face.width, face.height);

Face size

When start­ing the web­cam an ini­tial face size is stored and used as a ref­er­ence point for all future scal­ing. Com­par­ing the cur­rent face’s height and the orig­i­nal face’s height we get a sim­ple scale fac­tor that we can apply directly as a trans­form on an ele­ment. This can be tran­si­tioned, if your com­puter can take that, at the same time as doing the video pro­cess­ing and face detection.

Paul Hayes

Paul Hayes is a developer at Last.fm. You should follow him on Twitter, where he talks about UX, HTML, CSS and JavaScript, amongst other cool stuff.