IBC2022: This Technical Paper presents an overview of a wide range of multi-encoding schemes with and without the support of machine learning approaches integrated into the HEVC Test Model (HM) and x265, respectively.

Abstract 

The growth in video internet traffic and advancements in video attributes such as framerate, resolution, and bit-depth boost the demand to devise a large-scale, highly efficient video encoding environment. This is even more essential for Dynamic Adaptive Streaming over HTTP (DASH)-based content provisioning as it requires encoding numerous representations of the same video content. High Efficiency Video Coding (HEVC) is one standard video codec that significantly improves encoding efficiency over its predecessor Advanced Video Coding (AVC). This improvement is achieved at the expense of significantly increased time complexity, which is a challenge for content and service providers. As various representations are the same video content encoded at different bitrates or resolutions, the encoding analysis information from the already encoded representations can be shared to accelerate the encoding of other representations. Several state-of-the-art schemes first encode a single representation, called a reference representation. During this encoding, the encoder creates analysis metadata with information such as the slice-type decisions, CU, PU, TU partitioning, and the HEVC bitstream itself. The remaining representations, called dependent representations, analyse the above metadata and then reuse it to skip searching some partitioning, thus, reducing the computational complexity. With the emergence of cloud-based encoding services, video encoding is accelerated by utilising an increased number of resources, i.e. with multi-core CPUs, multiple representations can be encoded in parallel. This paper presents an overview of a wide range of multi-encoding schemes with and without the support of machine learning approaches integrated into the HEVC Test Model (HM) and x265, respectively. Seven multi-encoding schemes are presented, and their performance in encoding time complexity and bitrate overhead compared to the state-of-the-art approaches are shown. Enabling fast multi-encoding for HAS in modern Over-the-top(OTT) workflows will reduce time-to-market and costs immensely.

Introduction 

HTTP Adaptive Streaming (HAS) is the de-facto standard in delivering videos over the internet to a variety of devices. The main idea behind HAS is to divide the video content into segments and encode each segment at various bitrates and resolutions, called representations, which are stored in plain HTTP servers. These representations are stored in order to continuously adapt the video delivery to the network conditions and device capabilities of the client. To meet the high demand for streaming high-quality video content over the Internet and overcome the associated challenges in HAS, the Moving Picture Experts Group (MPEG) has developed a standard called Dynamic Adaptive Streaming over HTTP (MPEG-DASH). The increase in video traffic and improvements in video characteristics such as resolution, framerate, and bit-depth raise the need to develop a large-scale, highly efficient video encoding environment. This is even more crucial for DASH-based content provisioning as it requires encoding multiple representations of the same video content.

High Efficiency Video Coding (HEVC) is one standard video codec that is widely being used in content production nowadays. Based on Bitmovin’s video developer report in 2021, HEVC is used in 49% of productions in 2021 and it is expected to be added to more than 25% of extra productions in 2022. HEVC significantly improves coding efficiency over its predecessor Advanced Video Coding (AVC). This improvement is achieved at the cost of significantly increased runtime complexity, which is a challenge for content and service providers. As various representations of the same video content are encoded at different bitrates or resolutions, the encoding analysis information from the already encoded representations can be shared to accelerate the encoding of other representations.

Several state-of-the-art schemes, first encode a single representation, called a reference representation. The encoder creates analysis metadata (file) with information such as the slice-type decisions, CU, PU, TU partitioning, and the HEVC bitstream itself during this encoding. The remaining representations, called dependent representations, analyse the above metadata and then reuse it to skip searching some partitioning, thus, reducing the computational complexity. With the emergence of cloud-based encoding services and in live applications video encoding is accelerated by utilising an increased number of resources, i.e., with multi-core CPUs, multiple representations can be encoded in parallel.

In this paper, the schemes are analysed for both serial and parallel encoding environments. The term multi-rate is used when all representations are encoded at a single resolution but at different bitrates. Multi-encoding is used when a single video is provided at various resolutions, and each resolution is encoded at different bitrates.

Download the paper below