The world’s largest video streaming platforms alone account for a few billion users globally. Live video is demanding far more bits per second than any other type of internet usage. In 2019, Cisco estimated that video would make up at least 80% of the 4.8 zetabytes of data being delivered over the internet by 2022. More than a billion hours of content is consumed on a single streaming platform every single day. As a result, the industry’s carbon footprint is growing exponentially and now exceeds that of the airline industry.

Last year, The Guardian reported that just one of the world’s top three streaming sites is responsible for emitting more greenhouse gases than Glasgow – the site of the 2021 Cop26 climate summit.


Sergio Grce, CEO iSIZE Technologies

The urgent challenge facing media and entertainment companies today is the need to reduce their environmental impact, while still delivering the impeccable quality and end-user experience that viewers expect and demand. As they tussle with the perennial issue of quality, many of the big names in Silicon Valley have set ambitious goals of achieving net zero by the end of 2022.

To reach those goals industry needs to come together and find the innovative approaches that are readily available and scalable today. Technology companies play a big role in reducing the environmental impact of streaming, through further efforts to increase energy efficiency – both in the near-term with new technologies and through developing next-generation technologies.

Innovating for success

The carbon footprint of streaming video depends on the electricity usage for data centres, data transmission and devices, and then on the CO2 emissions associated with each unit of electricity generation. This means that the overall footprint of streaming video depends most heavily on how the electricity is generated. Energy efficiency of digital technologies has improved rapidly. Also, with more investing in renewable energy to power data centres and networks, the environmental impact of streaming isn’t growing as exponentially as it was in the past. However, it is becoming increasingly likely that the efficiency gains of current technologies may be unable to keep pace with this growing data demand. To reduce the risk of rising energy use and emissions, investments in efficient next-generation computing and communications technologies are needed, alongside continued efforts to decarbonise the electricity supply.

“The industry’s carbon footprint is growing exponentially and now exceeds that of the airline industry”

Rather than looking to traditional ways of doing things, using standard video compression algorithms, at iSIZE we pursue a new approach towards reducing the environmental impact of video streaming by using AI-based preprocessing prior to encoding in order to make the ingested content easier, and more efficient, to encode.

Take the example of audio. Over two decades ago, when the web was young, the first attempts to deliver audio over the internet required a bandwidth of 1-2Mb/s, which meant it was inaccessible for a lot of users. The developers of the original MPEG Layer-3 standard tackled the audio bandwidth issue by developing algorithms that eliminated those parts of the signal which would not be perceived by most listeners. The result was MP3.

The point of new approaches in video delivery is to bring together psychovisual approaches and artificial intelligence in order to remove video information that is known to be imperceptible to viewers while reaching outstanding compression levels by standard MPEG or AOMedia encoders.

Going above and beyond

Not all pixels are created equal: if one develops an actionable understanding of how people perceive video content, then one can remove unnecessary detail in the input video content that incurs significant bitrate overhead in typical video encoders.

The end-user experience is a key market differentiator, so it is important that this does not result in impairment to the visual quality – whether we are sharing videos for family contact or for business, we do not want them to look blurry/noisy or cartoon-like. A means of quantifying visual distortion is required. And if we look beyond tried and tested technologies, new possibilities open up. This is where AI and deep psychovisual preprocessing come in.

Due to the use of neural networks, such video preprocessing is ideally suited to GPU operations, so the computations can be run in a massively parallel architecture. On a small scale it could run on a typical PC or mobile phone, since all such devices now have increased GPU or NPU (neural processing unit) capabilities; broadcasters and content providers could also use cloud processing to deliver at scale and at resolutions up to Ultra HD. The processing overhead, though, is more than counterbalanced by the reduction in encoding complexity as well as a significant decrease in the bandwidth requirements for the compressed video.

AI comes to the fore

iSIZE has shown that an AI engine learns to distinguish perceptually unnoticeable details in content in an autonomous manner and does not need any input from the encoder. The result is an increase in the compression efficiency of AVC, HEVC, VP9 and AV1 of between 12% and 50% (depending on the use case). These results are validated by extensive commercial tests and human mean opinion scores obtained with standard protocols like ITU-T P.910 tests.

The other key factor in successfully implementing such technology is to ensure 100% standard compliance and cross-standard/cross-codec applicability, which means that it is not beholden to any standard or format and can be applied to any application, platform, or workflow that has to move video data quickly and efficiently. This ensures seamless integration without breaking any video coding or streaming standards.

A key issue for many customers is avoiding additional compute complexity in their already complicated workflows. Solutions exist today that increase the efficiency and performance of all the latest codec standards, including AVC/H.264, HEVC/H.265 and VP9, but they are adding significant compute complexity at the same time. iSIZE’s codec-independence and fast execution means the capability to reduce video delivery system bitrate requirements without adding significant complexity and without waiting for the often-lengthy process of new codec standards to be developed and widely adopted.