• Scientists at early stages of developing an advanced AI video compression tool
  • Algorithm to conduct an operation to encode frame content in a unique
  • “Every video compression approach works on a trade-off,” says research lead

UCI Disney image

Stephan Mandt, UCI: ”Every video compression approach works on a trade-off.”

Source: UCI

Computer scientists from the University of California, Irvine and Disney Research have demonstrated how deep learning can compete with classical codecs using AI-enhanced video compression. 

The end-to-end approach has generated “favourable results” proving it could be a viable challenger to established video compression technology.

The duo announced the success of the project earlier this week after unveiling the work in December last year during the Conference on Neural Information Processing Systems in Vancouver.

UCI assistant professor of computer science and research team leader Stephan Mandt said: “Ultimately, every video compression approach works on a trade-off.

“If I’m allowing for larger file sizes, then I can have better image quality. If I want to have a short, really small file size, then I have to tolerate some errors.

“The hope is that our neural network-based approach does a better trade-off overall between file size and quality.”

The project, which is still in the early phase of development, proved there is less distortion and significantly smaller bits-per-pixel rates than classical coding-decoding algorithms - such as H.265 -when trained on specialised video content, achieving comparable results on downscaled videos, such as those publicly available on YouTube.

Mandt, who began the project while he was employed at Disney Research added: “Intuitively, the better a compression algorithm is at predicting the next frame of a video – given what happened in the previous frames – the less it has to memorise.

“If you see a person walking in a particular direction, you can predict how that video will continue in the future, which means you have less to remember and less to store.”


The team worked to downscale the dimensions of the video using a so-called variational autoencoder.

In a statement it confirmed: “This is a neural network that processes each video frame in a sequence of actions that results in a condensed array of numbers. The autoencoder then tries to undo this operation to ensure that the array contains enough information to restore the video frame.”

The autoencoder is shaped like an hourglass and has low-dimensional, compact version of the image in the middle, which is how the team compressed every frame into something smaller.

Using AI, the algorithm attempts to guess the next compressed version of an image given what has gone before, using a technique called “deep generative model”.

Mandt noted that other researchers have done work in this area, so this particular method is not unique.

However, the researchers claim that what sets this project apart is the use of the algorithm to conduct an operation to encode frame content by rounding the autoencoder’s real-valued army to integers, which are easier to store than real numbers.

The final step is to apply lossless compression to the array, allowing for its exact restoration. Crucially, this algorithm is informed by the neural network about which video frame to expect next, making the lossless compression aspect extremely efficient.

Mandt said that these steps, as a whole, make this approach an “end-to-end” video compression algorithm.

He added: “The real contribution here was to combine this neural network-based deep generative video prediction model with everything else that belongs to compression algorithms, such as rounding and model-based lossless compression.”

The team confirmed they will continue to work towards a real, applicable version of the video compressor.