War for the Planet of the Apes is receiving enormous praise for the film’s compelling story – the continuing battle between apes and humans – as well as for the facial and body performance capture work by the apes actors, including Andy Serkis as ape leader Caesar, and for its cg creature work by Weta Digital.

But how does Weta Digital take that performance capture, which is acquired both on location and in motion capture volumes, and translate it to the living, breathing digital apes?

Cartoon Brew spent time recently in Wellington, New Zealand with the Weta Digital crew to find out more about the process, and we’ve summed it up in this step-by-step look at how the motion capture, facial motion, facial models, and animation departments at the studio work together to pull off these remarkable performances.

Planning and prep

Before any on-set shooting was done for War for the Planet of the Apes, Weta Digital took its actors through a phase of preparatory steps. Largely this involved capturing basic facial shapes and expressions (an extension of the Facial Action Coding System or FACS approach), phonemes, and sometimes parts of the actor’s dialogue from the script. This prep shoot involved all the actors who would be driving the apes, all the characters that required digital doubles, and a number of generic performers used for background apes.

From left: Karin Konoval, Terry Notary, Andy Serkis and Michael Adamthwaite in motion capture suits, and the final frame from the film.
From left: Karin Konoval, Terry Notary, Andy Serkis and Michael Adamthwaite in motion capture suits, and the final frame from the film.

But why is this initial stage necessary? The idea is that it gives the motion capture, facial motion, and facial model departments an early start in extracting the shapes they will need to setup Weta Digital’s facial solver. This is the key solution the visual effects studio has R&D’d over several projects – from Lord of the Rings, King Kong, Avatar, and the other Apes films – to take human facial performance and re-target it to a cg character.

The facial models department, in particular, also uses this step to obtain a reconstructed animated mesh from the performance capture dataset as reference for the shapes they build into the character rigs. A total of 15 actors took part in the preparatory stage over a few days. It even included some additional capture of generic motions like being idle, hooting, and exerting rage – all of which were added to Weta Digital’s facial library.

The shoot

On set in British Columbia, the actors wore suits containing ‘active’ markers. Many might be familiar with the optical capture process (which was used in the more contained mocap volumes), where motion capture cameras situated around the performers capture reflective markers on the suits. ‘Active’ capture relies instead on illuminated LED lights in the suits. Rather than reflect light back to the cameras, the markers are powered to emit their own light – it helps increase the size of the volume and amount that can be captured.

The first re-booted Apes film, Rise of the Planet of the Apes, was shot with 60 Motion Analysis cameras with up to four performers wearing active markers at a time. The second film, Dawn of the Planet of the Apes, used 80 Standard Deviation cameras specifically made to cater for an outdoors shoot. Wireless motion capture cameras were also introduced, and that allowed the capture of up to 14 performers wearing active markers in some fairly rugged forest conditions.

Original plate with ape performers (here, wearing gray tracking suits for performance reference).
Original plate with ape performers (here, wearing gray tracking suits for performance reference).
Lighting reference passes.
Lighting reference passes.
CG apes and horses.
CG apes and horses.
Final shot.
Final shot.

On War for the Planet of the Apes, this technology has been upped again – with 140 cameras being relied upon and conditions even more arduous. There’s snow and rain and mud and dirt and often faster-paced action. The performers also wore helmet-mounted capture cameras looking back at their face (lighter and smaller than previously, but with higher resolution and a faster frame rate). Sometimes the ‘helmet-cam’ is simply removed to enable the best performance possible. Tracking markers painted on fixed areas of the face aid in the facial solve.

Additional reference gathering is also a large part of the shoot. Several witness cameras, Sony F55s shooting at 4K, are used and provide for ‘whole of reference’ footage, very often looked back to by the animators. There’s also still photography, set surveys, HDRI capture, and, of course, what’s filmed with the principal film cameras.

Karin Konoval plays the orangutan Maurice.
Karin Konoval plays the orangutan Maurice.
Mocap to animation

At this point, body motion editors deliver motion from ‘map to anim’ to ‘director approved’ states for both hero characters and big crowd scenes. In addition to proprietary software, Weta Digital uses Nuance for body motion editing (Nuance is a tool originally developed by Giant Studios).

Facial translation is slightly different, and tends to use all of the available sources – the facial capture, witness cameras, principal camera – to understand and translate a performance onto a cg model.

Original principal photography plate.
Original principal photography plate.
CG renders.
CG renders.

Weta Digital is looking at the observable physical changes of the actor’s face but also working to match what that performance ‘feels’ like. The actor’s intent is important and the studio is trying to respect that performance and make sure the audience takes away that same feeling. This means it is not, and never is, a direct translation from the performance capture data to final cg character.

A key part of that is the fact that human faces and ape faces are different and have different muscle shapes. Making an ape face activate the same muscles as a human face – which would be the result if you just applied the facial tracking data – creates a different visible result and usually fails to deliver the emotional intent the actor intended. This is why making adjustments and keyframing is an essential part of the ape facial animation workflow.

Final shot.
Final shot.

For each shot, it is the individual animator’s choice as to how much of the solve data they might use in the process of building up the performance they are matching. For them, it’s about the result, not the process. This is slightly different for huge crowd scenes, where the work required to massage facial animation would be intense, so mostly this does come from facial motion capture.

Still, the facial mapping process is a key part in just getting the data to accurately translate from human to ape and give animators a major starting point. Facial mapping itself happens as part of Weta Digital’s solving process.

Andy Serkis as Caesar.
Andy Serkis as Caesar.

Since the studio’s facial rigs are based on an extension of FACS, there is a high level of consistency between the various creatures Weta Digital works on, so mapping is very similar from character to character. The face is split into regions such as brows, eyelids, cheeks, nose, and so on, and each shape is mapped from the actor’s rig to the character rig.

A simple example would be, whenever the studio’s tools identify that Andy Serkis has closed his eyes, Caesar will do the same, and the eyelids region in mapping will be used to get the ‘eyes-closed’ shape activated.

A final performance

Actors like Andy Serkis have become incredibly professional at knowing how to make the most of motion capture and inhabit the characters they are portraying. But there is also a significant and equal contribution made by the artists at Weta Digital, from many different teams on many levels. These artists have studied the same actor faces for so long now, and their intent is to pass along the ‘essence’ of a performance, not take it over. That’s why these apes feel real.