Streamers look to AI to crack the codec code

No comments

Streamers are looking to AI to dramatically improve compression performance and reduce costs with London-based Deep Render claiming that its technology has cracked the code.

For streamers, every bit counts. Their ability to compress video while maintaining quality and reducing bandwidth is critical to business. But as content increases in volume and richness, the limits of existing technology are buckling under pressure.

Cofounders Chri Besenbruch (left) and Arsalan Zafar (right)

Cofounders Chris Besenbruch (left) and Arsalan Zafar (right)

The looming problem has been apparent for several years with developers turning to AI and machine learning as a potential salvation. The prize is a market estimated to be worth $10bn by 2030 which makes AI codec developers prime targets for acquisition.

AI techniques are already being used to optimise existing codecs like H.264, HEVC, or AV1 by improving motion estimation, rate-distortion optimisation, or in-loop filtering. Content-aware techniques, pioneered by Harmonic, use AI to adjust the bit rate according to content.

UK-based firm iSIZE, for example, built an AI-based solution that allowed third-party encoders to produce higher-quality video at a lower bitrate and was acquired by Sony Interactive Entertainment last winter.

A second approach is to build an entirely new AI codec. California startup WaveOne was developing along those lines and was promptly bought out by Apple in March 2023.

That leaves the field open to one company which claims to have developed the world’s first AI codec and the first to commercialise it.

Revolutionary AI integration

Deep Render, a London-based startup, has sidestepped the entire traditional codec paradigm and replaced it with a neural network module.

av1

The same image with AV1

deeprender

Deep Render: “Both are at the same bitrate, but there is a significant difference in quality,” says Zafar

“This is an iPhone moment for the compression industry,” Arsalan Zafar, Deep Render co-founder and CTO tells IBC365. “After years of hard work and exceptional R&D, we’ve built the world’s first native AI codec.”

He claims its technology is already “significantly better at compression, surpassing even the next generation codec such as VVC” and that its approach provides the opportunity for 10-100x gains in compression performance “advancing the compression field by centuries.”

What’s more, its tech is already in trial at “major publishers and Big Tech companies” which IBC365 understands to include Meta, Netflix, Amazon, YouTube, Twitch, Zoom and Microsoft.

Roll-out will begin from Q1 2025 before moving towards mid-market publishers and prosumers.

“For the first time in history, the industry will go from ITU-backed standardised codecs to one company supporting the codec for all major content providers,” Zafar claims.

Moving Picture Experts Group (MPEG) has set the standard for digital compression for more than three decades but has recently seen its monopoly eroded by streaming video services eager to find a competitive edge. The prevailing standard is H.265 / HEVC first developed in 2015 and its successor is VVC – but Deep Render claims its technology is 80% better than MPEG-4/H.264 and ~10% ahead of VVC today, with significant advances by the end of the year as its algorithms develop.

“We are working with major content publishers to embed our AI codecs throughout their content delivery chain from encoder to decoder and all network layers in between,” Zafar says. “We’ll make sure all the data works and build that relationship to a point where they are happy to rely on our codec and for us to be their main codec provider. They will wean off MPEG codecs. We expect all major content publishers to be using Deep Render codecs.”

Potential cost savings

Zafar’s background is in spacecraft engineering, computer science and machine learning at Imperial College London. He founded Deep Render in 2019 with fellow Imperial computer science student Chris Besenbruch. The company now employs 35 people and last year received a £2.1m grant from the European Innovation Council and raised £4.9m in venture capital led by IP Group and Pentech Ventures.

The company’s confidence stems from the fact there is a real business issue to solve. The more bandwidth the services of heavy streamers like Netflix take up, the more they pay to content delivery network providers like ISPs.

Deep Render estimates that a streamer such as Netflix could save over £1bn a year on content delivery costs by switching to its technology.

“Content published online globally is exponentially increasing but existing codecs are showing diminishing returns,” Zafar argues. “If you combine these two things it’s not great for the future of any business.”

He asserts that YouTube and Twitch stream huge amounts of content at a massive financial cost in bandwidth. “They really feel the pain and would love to shave a few billion off their content delivery costs. The easiest way to do that is with a better codec.”

There is continuing tension between streamers and telcos about the cost of carriage over telco-owned networks. Telcos argue that streamers should pay more. Content publishers push back knowing that their business model is under threat.

“ISPs could turn around tomorrow and significantly increase the cost they charge for carriage, or lower the streamer’s resolution or framerate or throttle their bandwidth to popular regions,” Zafar says. “This over-reliance on ISPs threatens the streamer’s business model. One way to deleverage the ISPs is to have a better compression scheme such that the compression itself is no longer an issue.”

The problem with existing compression

Traditional video compression schemes have arguably approached the limits of efficiency. MPEG/ITU-based codecs have been iteratively refined over nearly 40 years and most of the significant improvements in algorithms for motion estimation, prediction, and transform coding have already been realised. Every new codec makes the block sizes larger and adds more reference frames, but there is a limit to how long this can continue.

profile

Arsalan Zafar, Deep Render

Enhancements in compression efficiency often come with increased computational complexity, which can be prohibitive for real-time applications or devices with limited processing power. The cost of encoding for example increases around 10x with each new codec.

Traditional methods have also found it difficult to take the human visual system into account. According to Zafar, the perceptual limits have been reached because we lack a rigorous understanding of how our vision works and we can’t write it down mathematically. However, methods that learn from data can learn these patterns and finally enable this.

Advantages of AI compression

AI codecs use algorithms to analyse the visual content of a video, identify redundancies and nonfunctional data, and compress the video in a more efficient way than conventional techniques.

AI-based schemes use large datasets to learn optimal encoding and decoding strategies, which can more effectively adapt to different types of content than fixed algorithms.

Secondly, instead of breaking down the process into separate steps (like motion estimation and transform coding), AI models can learn to perform compression in an end-to-end manner, optimising the entire process jointly. This makes the codec more context-aware.

AI models can also be trained to prioritise perceptual quality directly, achieving better visual quality at lower bitrates by focusing on features most noticeable to human viewers.

Being software-based not only means AI codecs are more performant, since they do not rely on specialist hardware, but the expense and time of manually ripping and replacing systems can be null and void. This also means that the conventional 6-8 year cycle for introducing next-gen codecs can be dramatically slashed.

“This is the true beauty of it,” Zafar says. “You could effectively stream a new codec overnight with a whole new set of parameters. Updateability is extremely easy and significantly reduces costs as specialised silicon is no longer required.”

Unlike traditional codecs which are fixed one-size-fits-all systems, an AI codec could be optimised for specific content, further increasing efficiency.

Zafar says, “The football World Cup is streamed to between 500 and a billion people. An AI codec specifically trained on football match data sets would be significantly less expensive per bit when streamed at such scale.”

Deep Render says it would optimise its content specialisation algorithm for customers based on the customer’s own data.

Other AI optimisation techniques are also being evaluated for commercial use. Companies like Bitmovin are playing with using AI to optimise encoding parameters dynamically, improving efficiency and video quality.

Nvidia RTX Video Super Resolution uses AI-driven post-processing to improve video quality through denoising, super-resolution, and artefact removal.

MPEG is now studying compression using learning-based codecs and reported on this at its most recent meeting.

MPEG founder Leonardo Chiariglione now runs the Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) initiative, and is developing a suite of AI-driven systems and standards notably an end-to-end video codec called EVC.

But the gears may grind too slowly for the urgent demands of streamers.

“We have built an entirely new end-to-end, data drive, perceptually optimised codec from the ground up using AI,” says Zafar, who has also produced an AI codec primer course. “All modules such as motion estimation, prediction, and transform coding are captured within this one neural network.”

With all this in mind, however, it is important to note that AI video compression is an emerging field with much R&D ahead.

One potentially significant hurdle is that deploying AI-based codecs requires compatibility with existing video playback and streaming infrastructure. Another is that AI codecs currently lack universal standards, making industry-wide adoption more challenging.

Zafar says Deep Render is leaving the door open to standardising Deep Render. “A lot of inefficiencies come with the standardisation process and we prefer to move fast but standardisation is not completely out of the picture. It has some benefits like building confidence among customers.”

Nor is compressing the data in 8K UHD video possible with Deep Render until at least 2025 or beyond.

“AI codecs are at the beginning of their development cycle,” Zafar says. “We have internal research showing significantly superior performance. These will mature over the next year, providing unprecedented gains in compression performance. We’ve barely scratched the surface.”