IBC2023: This Technical Paper reviews the state-of-the-art in adaptive streaming using closed GOPs and their shortcomings, how VVC enables open GOP adaptive streaming using reference picture resampling, and describes the benefits and challenges of integrating open-GOP encoding in a highly scalable cloud transcoding solution as well as in a live encoding workflow.

Abstract

Over-The-Top adaptive streaming technology has become a popular method for delivering high-quality video content over the internet, adjusting the video quality based on the user’s internet connection speed and device capabilities. It uses multiple bit-rates encoding of video content, where the video is divided into smaller segments of varying bit-rates and resolutions. Due to codec constraints, the segments had to be coded in a so-called closed GOP configuration while in broadcast, a more efficient open GOP is widely used. The emerging Versatile Video Coding (VVC) standard allows the use of the more efficient open GOP coding approach in adaptive streaming as well. In this paper, the integration of open GOP coding in a cloud transcoding and a live encoding adaptive streaming application are described and discussed. In addition, an informal subjective test confirmed the benefits of the proposed open GOP technique, showing that subjective quality improves considerably compared to closed GOP coding.

Introduction

Random access points (RAPs) are very important in video entertainment applications. They refer to the specific points within a coded video stream where a viewer can begin playback without having to wait for the entire stream to load. This is particularly important in broadcast to tune-in or switch channels as well as in adaptive streaming, where video streams are often divided into smaller segments and delivered dynamically based on the viewer’s bandwidth and device capabilities.

In video coding, a group of pictures (GOP) define hierarchical referencing structures between RAPs. A RAP is always characterised by an intra-picture predicted frame and modern video codecs often use multiple GOPs in between. To avoid confusion, this paper uses the term GOP for these smaller groups and the term intra period to refer to the distance between two RAPs. Traditionally, GOPs at RAPs were “closed”, i.e. the inter- picture prediction of a codec cannot reference pictures from GOPs before the RAP. This reduces the coding efficiency because it restricts the temporal redundancies to be exploited. More recent standards, e.g. High Efficiency Video Coding (HEVC), facilitate so- called “open” GOP coding for higher compression efficiency. In the broadcast world, open GOPs are already widely used. In the adaptive streaming world, closed GOPs are used for random access as well as for switching a rendition e.g., spatial resolution or bit-rate. When switching spatial resolutions, open GOP inter-picture referencing between GOPs is prohibited by legacy codecs as spatial scaling is required. The most recent Versatile Video Coding (VVC) standard introduces a functionality called reference picture resampling (RPR) to address that shortcoming. In addition to that, VVC encoder restrictions prevent unpleasant visual artefacts, which can be caused by open-GOP resolution switching. More details on HEVC and VVC can be found in a detailed overview by Bross et al (1).

In this paper, we first review the state-of-the-art in adaptive streaming using closed GOPs and their shortcomings. After that, a short description of how VVC enables open GOP adaptive streaming using reference picture resampling and certain encoder constraints is given. Before concluding this paper, we describe the benefits and challenges of integrating open-GOP encoding in a highly scalable cloud transcoding solution as well as in a live encoding workflow.

Download the paper below.