AI is the future of post production - it “unlocks the impossible” | Industry Trends

No comments

Call it AI or machine learning, it has the potential to transform post production workflows and put the focus squarely back on the creative process.

Artificial intelligence (AI) has long been the preserve of science fiction narratives involving sentient machines and killer automatons. But for many years now, the term has been a major part of the technologists’ lexicon. Everywhere you look, applications, games, services, transportation, cybersecurity and even consumer goods are leveraging the power of AI. It’s become shorthand for automating systems that perform laborious tasks – often creatively – beyond the power of mere mortals.

AI is able to examine a series of post production features and patterns

“I think artificial intelligence is a misnomer to a certain extent,” argues Martine Bertrand, Senior AI Researcher at DNEG Montreal. “The reason why there’s so much hype around these technologies is because, for most of us, we get convinced this is artificial intelligence in the form of a super-smart robot or a brain living in a cloud, that can suddenly do magical stuff.

“I would get away as much as possible from the term. Machine learning is the correct way to refer to a lot of those approaches. And deep learning is essentially just taking some concepts of a particular brand of machine learning using neural networks.”

Find out more about AI in broadcasting and media

Post production is a perfect target for machine learning because an awful lot of it is repetitive and time consuming, yet still requires a human touch. Think rotoscoping, for example; crafting 3D assets; colour grading; noise reduction; mocap retargeting; creating ‘Deepfake’ characters; de-aging actors’ faces; texture creation; even generating music scores – the list is endless.

AI’s usefulness in these areas stems from its ability to examine a series of entries – frames of video, photographs of an object, spoken words etc. – recognise common patterns or features and make reasoned decisions about them. These systems can then be instructed to perform specific actions to generate a particular end result.

The development of AI-based tools is making real strides, but it’s still at a stage where it’s very rarely a magic bullet; currently it’s more like an enthusiastic assistant. “Creating visual effects for a movie is all about detail, right?” Bertrand suggests. “As far as I can tell, machine learning models are not really good with details. They’ll be good with the grunt work. Or if you make something very specific, you might be able to get a really, really good Deepfake.”

The power of creation

Bertrand – who has a PhD in theoretical and computational physics – recently joined DNEG after stints working on language processing and medical imaging. She explains how DNEG has a long-term research programme focused on performance capture – facial performance and full-body performance – and how they can introduce state-of-the-art machine learning approaches to speed up the process of creating digital doubles.

“Digital doubles are really, really expensive to create right now,” she says. “They’re very popular, and are in lots of recent action-packed movies – superheroes and such. But it’s impressive how much time it takes to create a good-looking digital double. So what we’re trying to do is chip away at that. We have a few research initiatives on their way, tackling various parts of the pipeline to create digital doubles.”

The various elements of the digital creation pipeline – including capture, wrapping, rigging, and then animating – can all potentially benefit from machine learning/ AI assistance.

“From a post production perspective, and a production perspective, machine learning is solving real-world problems right now.”

“Full facial capture is a very tedious process for the actor right now,” she acknowledges. “They get into that booth, and make all those faces. We need to capture all of this, and usually it’s done with a few dozen cameras. It takes time, it’s tedious for the actor, and it generates a humongous amount of data – humongous – I think between 40 and 80 terabytes of data. And then we need to generate a point cloud, and then this needs to be sent to some other company that has an expertise in wrapping. They’ll wrap the mesh onto the point cloud, and they’ll clean it up, register it, and line it up.

“So it’s very, very lengthy and tedious. To a certain extent, we’re trying to see if there are machine learning approaches, where you can compress the captured data into some kind of lower dimensional representation that is embedded in the weight of a machine learning model, for instance. And that would be just a few hundred megabytes. You could transfer it around. People could poke it and then extract from it, maybe directly, then wrap the surface for that matter. We’re looking at approaches for automated wrapping, and registering to internal, canonical rigs for faces.”

Having a ball with NeRFs

In the field of scene reconstruction, Bertrand mentions a new approach that’s gaining lots of traction, called a neural radiance field or NeRF. “It’s fantastic,” she says. “You can do nifty things with it.”

Basically, NeRF’s use a series of photographs of a scene or object fed into a neural network, which then generates a volumetric rendering. It’s the next evolutionary step in photogrammetry, able to reproduce fine geometry and occluded details.

“NeRFs typically are trained on a single scene,” explains Bertrand. “The entire goal of the task is to be able to represent a given scene with a bunch of pictures. Let’s say you pick pictures around the head of somebody – you’ll be able to generate novel views. You’ll be able to give a new camera position with new camera direction, and the neural network will predict what the image should look like from that point of view.”

One of the downsides – as with a lot of machine learning applications – is that it’s very scene specific: it can only represent what you’ve trained it to. It also takes time to resolve the image, although a new technique featuring plenoptic volume elements – or Plenoxels for short – accelerates the process from hours down to minutes.

Benefits of machine learning

Dan Ring is head of research at Foundry, whose research team is investigating speculative technologies. With regards to the advent of machine learning, he sees it as having two key components.

“The first one is that it unlocks the impossible,” he states. “Tasks that we’ve previously looked at, like deep learning of images. This particularly came up around the time of stereo productions where you’re shooting with two cameras. And when one camera is out of focus relative to the other, you want to try and sharpen that eye, because otherwise it looks as though you’re wearing a bad set of glasses. We looked at this issue a lot. The state of the art at the time was good, but it still gave other artefacts you have to trade off, and so we just couldn’t get it to work.”

“Machine learning raises all these really lovely questions, like is this devaluing art? Is this devaluing artists?”

“Fast-forward eight or ten years, and suddenly there’s a tool released on GitHub, called SRN Deblur [Scale-Recurrent Network for Deep Image Deblurring], and it just does it. It does it automatically and to an incredibly high quality. It basically showed us there was a tool that could solve a problem that was previously impossible, and solvable with commodity hardware, like hardware that you have in your house.”

A similar situation was encountered by Ben Kent, leader of Foundry’s AI team. The film he was working on got the ideal shot, with a perfect performance from the actor, but the focus puller missed his mark. Kent got the footage back and ran it through their own tuned de-blur tool, built using Foundry’s CopyCat, a set of machine learning nodes in Nuke. Within minutes, he was able to fix the shot which was previously deemed unusable. “So not even from a post production perspective, but a production perspective, [machine learning] is solving real-world problems right now,” adds Ring.

Work faster, not harder

Ring’s second component is machine learning’s ability to accelerate workflows and allow the artist to be more creative. “Machine learning today is typically about collecting vast amounts of example data, turning the handle, and hoping that you get something that generalises to a wider problem,” he explains. “But we know in post production, that artists only care about solving their problem, that one shot they’re working on today – or possibly for the next three months – but it’s that one shot, and they don’t care about it after that.”

He explains how Foundry’s CopyCat was designed so artists can provide the system with image examples – before and after – then train the system to achieve that one single shot. “A really good example was somebody who was working on a well-known zombie show. He was tasked with zombifying all of the eyes of the actors, and imagine by the end of the series, you’d have to do this for hundreds of zombies. So he painted out the eyes for maybe ten zombies, he trained that model, and then he was able to apply that effect to all of the other characters. It wasn’t entirely perfect, but it got about 95% of the way there and the remaining 5% were quick fixes.”

17202_framestorekiyanprincedeepfakecrop_402112

Framestore accelerated their creative process using CopyCat from Foundry

And this example leads to another key benefit of machine learning, says Ring: scale and volume. “The move from tentpole features to high end episodic is great,” he says, “but in terms of a VFX Artist, it’s all about volume. And if you’re a freelancer or an independent vendor, volume is really important. If they’re able to unlock this task – which was maybe tedious and laborious – it means they’re then free to do the more creative stuff. These tools can give freelance artists an edge to compete against larger studios. Again, it’s all about volume. If you’re doing things like having to denoise plates, or deblur plates, or up-res plates, which is a huge thing for a lot of animation, you can do that.”

The quest for the Holy Grail

One of the holy grails of AI-based tasks is that of rotoscoping; extracting a matte – usually of an actor – by painstakingly drawing around their outline. “It’s been hard since the early days of Disney,” says Ring, “and it’s still an incredibly hard and expensive process. At the moment, that sort of grunt work is often given to junior artists or outsourced to other specialist companies around the world.”

The problem has been tackled at various times by members of Foundry – including founder Simon Robinson and Dan Ring himself. But now it looks like a solution is on its way, courtesy of machine learning.

“We’ve embarked on this ‘smart roto’ project with DNEG and the University of Bath. We’ve advanced it far enough along just to answer the question, ‘do we think this is possible?’, and yes we think it’s possible. We finished the grant-funded side of this, and now we’re commercialising it. And the interest from everybody has been huge. Everybody is saying, ‘Can I have it now? When can I have it?’ It still remains, in my mind, the hardest problem in post production, in terms of the skill set that you need and the quality level, as well as the cost savings that are potentially there.”

It’s still just a tool…

In terms of impact, Ring claims the machine learning SNR deblur example, was “one of the most amazing bits of work that I’ve seen.” More recently, he’s also been impressed by the trend in generative AI systems, like Disco Diffusion and DALL·E 2.

“It exposes machine learning and its potential and power to a huge audience,“ he says, “And it raises all these really lovely questions, like is this devaluing art? Is this devaluing artists? And is this helping; is this hindering? What we’ve seen is, actually, it’s reinforced our hypothesis that machine learning is about empowerment.

“If you look at Instagram, everybody can now create a cool image, a really nice, attractive image, but artists are creating better images, artists are creating the ones with intent. It’s like it’s giving them a stronger power to conceptualise and visualise what they have in their head, and also in a much faster way.”

The takeaway from all this is that AI-based tools are incredibly powerful and steadily improving. But they are also just that: tools. VFX artists’ jobs are not only safe, but potentially easier, too.

“I was talking to my old maths teacher about this,” says Ring, “and he recalled the time that the first calculators were brought into the university, and they were sabotaged by mathematicians who thought they were going to be put out of a job. I think it opens up the question about: is technology going to replace or enable? And everything we’ve seen so far is all around enablement and acceleration, and less about how do we completely remove this role.”

To find out more about how AI is being used in the broadcast and media industry, read AI in broadcasting and media