AI/ML and deep learning is having a huge impact in computer graphics research with potential to transform VFX production.

Avengers Infinity War Thanos 3x2

Avengers: Infinity War: Thanos

Source: Marvel Studios

In Avengers Endgame, Josh Brolin’s performance was flawlessly rendered into the 9ft super-mutant Thanos by teams of animators at Weta Digital and Digital Domain. In a sequence from 2018’s Solo: A Star Wars Story, the 76-year old Harrison Ford appears pretty realistically as his 35-year old self playing Han Solo in 1977.

Both examples were produced using artificial intelligence and machine learning tools to automate parts of the process but while one was made with the full force of Hollywood, the other was produced apparently by one person and uploaded to the Derpfakes YouTube channel.

Both demonstrate that AI/ML can not only revolutionise the VFX creation for blockbusters but put sophisticated VFX techniques into the hands of anyone.

“A combination of physics simulation with AI/ML generated results and the leading eye and hand of expert artists and content creators will lead to a big shift in how VFX work is done,” says Michael Smit, CCO of software makers Ziva Dynamics. “Over the long-term, these technologies will radically change how content is created.”

“Over the long-term, these technologies will radically change how content is created.” Michael Smit, Ziva Dynamics

Simon Robinson, co-founder at VFX tools developer Foundry says: “The change in pace, the greater predictability of resources and timing, plus improved analytics will be transformational to how we run a show.”

Foundry VFX

Mari 4.5: Painting package for VFX artists developed by Foundry

Source: Foundry

Over the past decade 3D animations, simulations and renderings have reached a fidelity in terms of photorealism or art-direction that is near perfection to the audience. There are very few effects that are impossible to create, given sufficient resources (artists, money), including challenges such as crossing the uncanny valley for photorealistic faces.

More recently the VFX industry has focussed most of its efforts on creating more cost-effective, efficient, and flexible pipelines in order to meet the demands for increased VFX film production.

For a while, many of the most labour intensive and repetitive tasks such as match move, tracking, rotoscoping, compositing and animation, were outsourced to cheaper foreign studios, but with the recent progress in deep learning, many of these tasks can be not only fully automated, but also performed at no cost and extremely fast.

As Smit explains: “Data is the foundational element, and whether that’s in your character simulation and animation workflow, your render pipeline, or your project planning, innovations are granting the capability to implement learning systems that are able to add to the quality of work and, perhaps, the predictability of output.”

Manual to automatic
Matchmoving, for example, allows CGI to be inserted into live-action footage while keeping scale and motion correct. It can be a frustrating process because tracking camera placement within a scene is typically a manual process and can sap more than 5% of the total time spent on the entire VFX pipeline.

Software developer Foundry has a new approach using algorithms to more accurately track camera movement using metadata from the camera at the point of acquisition (lens type, how fast the camera is moving etc). Lead software engineer Alastair Barber says the results have improved the matchmoving process by 20% and proved the concept by training the algorithm on data from DNEG, one of the world’s largest facilities.

For wider adoption studios will have to convince clients to let them delve into their data. Barber reckons this shouldn’t be too difficult. “A lot of this comes down to the relationship between client and studio,” he says. “If a studio has good access to what is happening on set, it’s easier to explain what they need and why without causing alarm.”

Rotoscoping, another labour-intensive task, is being tackled by Australian company Kognat’s Rotobot. Using its AI, the company says a frame can be processed in as little as 5-20 seconds. The accuracy is limited to the quality of the deep learning model behind Rotobot but will improve as it feeds on more data.

Arraiy AI VFX

Arraiy: AI-based tracking solution being utilised to solve both camera math-moving and object tracking of a person

Source: Arraiy

Other companies are exploring similar image processing techniques. Arraiy has written an AI that can add photorealistic CGI objects to scenes, even when both the camera and the object itself are moving. An example of its work has been showcased by The Mill.

The future of filmmaking is AI and Realtime
A proof-of-concept led by facility The Mill showcased the potential for real-time processes in broadcast, film and commercials productions.

‘The Human Race’ combined Epic’s Unreal game engine, The Mill’s virtual production toolkit Cyclops and Blackbird, an adjustable car rig that captures environmental and motion data.

On the shoot Cyclops stitched 360-degree camera footage and transmitted this live to the Unreal engine producing an augmented reality image of the virtual object tracked and composited into the scene using computer vision technology from Arraiy. The director could see the virtual car on location and was able to react live to lighting and environment changes, customising the scene with photo-real graphics on the fly.

The technology is being promoted to automotive brands as a sales tool in car showrooms, but its uses go far beyond advertising. Filmmakers can use the tech to visualise a virtual object or character in any live action environment.

A short film using the technology is claimed as the first to blend live action filmmaking with Realtime game engine processing.

Software first developed at Peter Jackson’s digital studio Weta for The Planet of the Apes films has been adapted in California by Ziva to create CG characters in a fraction of the time and cost of traditional VFX. Ziva’s algorithms are trained on physics, anatomy and kinesiology data sets to simulate natural body movements including soft tissue movements like skin elasticity and layers of fat.

“Because of our reliance on physics simulation algorithms to drive the dynamics of Ziva creatures, that even in 10,000 years when a new species of aliens rule the earth and humans are long gone, if they can ‘open’ our files they’d be able to use and understand the assets,” says Smit. “That’s a bit dark for humans but also really exciting that work done today could have unlimited production efficiency and creative legacy.”

Smit estimates that a studio would probably need to create fewer than five basic ‘archetypes’ to cover all of the creatures required for the majority of VFX jobs.

“Conventional techniques require experts, some with decades of experience, to be far too ‘hands-on’ with specific shot creation and corrective efforts,” he argues. “This often demands that they apply their artistic eye to replicate something as nuanced as the physical movement or motion of a secondary element in the story. Whereas we know that simulation and data-driven generative content can in fact do that job, freeing up the artist to focus more on bigger more important things.”

Democratising mocap
Similar change is transforming motion capture, another traditionally expensive exercise requiring specialised hardware, suits, trackers, controlled studio environments and an army of experts to make it all work.

RADiCAL has set out to create a motion capture AI-driven solution with no physical features at all. It aims to make it as easy as recording video of an actor, even from a smartphone, and uploading it to the Cloud where the firm’s AI will send back motion-captured animation of the movements. The latest version promises 20x faster processing and a dramatic increase in the range of motion from athletic to combat.

San Francisco’s DeepMotion also uses AI to re-target and post-process motion-capture data. Its cloud application, Neuron, allows developers to upload and train their own 3D characters — choosing from hundreds of interactive motions available via an online library. The service is also claimed to free up time for artists to focus on the more expressive details of an animation.

Pinscreen is also making waves. It is working on algorithms capable of building a photo-realistic 3D animatable avatar based on just a single still image. This is radically different to VFX simulations where scanning, modelling, texturing and lighting are painstakingly achieved such as ILM’s posthumous recreation of Carrie Fisher as Princess Leia or by MPC’s re-generation of the character Rachel in Blade Runner: 2049.

“Our latest technologies allow anyone to generate high-fidelity 3D avatars out of a single picture and create animations in real-time,” says Pinscreen’s Hao Lin. “Until a year ago, this was unthinkable.”

Our tech allows anyone to generate high-fidelity 3D avatars out of a single picture and create animations in real-time. Until a year ago, this was unthinkable.” Hao Lin, Pinscreen

Pinscreen’s facial simulation AI tool is based on Generative Adversarial Networks, a technique for creating new, believable 2D and 3D imagery from a dataset of millions of real 2D photo inputs. One striking example on synthesising photoreal human faces can be seen at

Such solutions are building towards what Ziva’s Smit calls “a heightened creative class”.

On the one hand this will enable professional VFX artists and animators to assign the technical work to automation in theory permitting more freedom for human creativity and on the other hand democratize the entire VFX industry by putting AI tools in the hands of anyone.

The videos posted at Derpfakes, of which Solo: A Star Wars Story is one, demonstrate the capabilities of image processing using deep learning. An AI has analysed a large collection of photos of a person (Ford in this case) and compiled a database of them in a variety of positions and poses. Then it can perform an automatic face replacement on a selected clip.


Touch of a button
Recent work at USC focusses on generating anime illustrations from massively trained artworks from thousands of artists. “Our algorithm is even capable of distinguishing the drawing technique and style from these artists and generating content that was never seen before using a similar style,” Lin reveals. “I see how this direction of synthesising content will progress to complex animations, and arbitrary content in the near future.”

Progress in this field is rapid, especially given the openness in the ML and Computer Vision community as well as the success of open source publication platforms such as arXiv. Further research needs to be done to develop learning efficient 3D representations, as well as interpretations of higher-level semantics.

“Right now, the AI/ML for VFX production is in its infancy, and while it can already automate many pipeline related challenges, it has the potential to really change how high-quality content will be created in the future, and how it is going to be accessible to end-users,” says Lin.

Human touch
While AI/ML algorithms, can synthesise very complex, photorealistic, and even stylised image and video content simply sticking a ‘machine-learning’ label on a tool isn’t enough.

“There’s a lot of potential to remove drudge work from the creative process but none of this is going to remove the need for human craft skill,” Robinson insists. “The algorithmic landscape of modern VFX is already astonishing by the standards of twenty years ago; and so much has been achieved to accelerate getting a great picture, but we still need the artist in the loop.

“Any algorithmic-generated content needs to be iterated on and tuned by human skill. We’re not in the business of telling a director that content can only be what it is because the algorithm has the last word. But we are going to see a greater range of creative options on a reduced timescale.”