After decennia of developing leading-edge 2D video compression technologies, MPEG is currently working on the new era of coding for immersive applications, referred to as MPEG-I. It ranges from 360-degree video with head-mounted displays to free navigation in 3D space, with head-mounted and 3D light field displays.
Two families of coding approaches, covering typical industrial workflows, are currently considered for standardisation – Multiview + Depth Video Coding and Point Cloud Coding – both supporting high-quality rendering at bitrates of up to a couple of hundreds of Mbps.
This paper provides a technical/historical overview of the acquisition, coding and rendering technologies considered in the MPEG-I standardisation activities.
The MPEG standardisation committee is currently working on MPEG-I coding technologies to support immersive applications, MPEG-I, where multimedia content can be viewed from various viewpoints, different from the camera acquisition viewpoints, therefore supporting free navigation around regions of interest in the scene, e.g. circling around a player in a sports event, similar to The Matrix bullet time effect, Karthikeyan.
MPEG-I ranges from 360-degree video on head-mounted displays (extension of existing video codecs with Supplemental Enhancement Information (SEI) messaging for the projection format, and the Omnidirectional Media Format – OMAF - to be standardised by end 2018) supporting head movements with 3 Degrees of Freedom (3DoF), extensions thereof supporting motion parallax within some limited range around the central viewing/camera position (referred to as 3DoF+, expected to be standardised beginning 2019), as well as larger ranges of freedom of movement, eventually achieving full 6 Degrees of Freedom (6DoF) allowing any user viewing position in 3D space, with standards to be accepted by industry around 2020, Koenen.
Competitive coding technologies for advanced VR/AR and light field display devices are under study, encompassing EquiRectangular video Projection (ERP), MultiView + Depth (MVD) Coding, as well as Point Cloud Coding (PCC), where the former two are familiar to video-based production workflows (e.g. 3D film production) and the latter to 3D graphics based workflows (e.g. 3D game production), both steadily evolving towards Cinematic VR/AR.
MPEG has issued several Calls for Test Material, Exploration and Core Experiments for comparing the relative merits of technologies from industrial proponents around the world, supporting 3D extensions of High Efficiency Video Coding (HEVC), Sullivan et al., and Versatile Video Coding (VVC), MPEG Press Release (5), for MultiView + Depth (MVD) Coding in video production, as well as Octree- and kd-based 3D data representations used for Point Cloud Coding (PCC) in early versions of Lidar devices, Schnabel et al.
VVC, which will be finalised in 2020 and will probably have inborn-support for 360-degree video. It is planned that 3DoF+ will be supported in the short term by market-existing 2D video codec devices adding supplementary metadata, while 6DoF may need enhanced coding tools in the longer term to handle even larger volumes of data.
In that respect, the maturity of existing technologies for PCC, assessed after a Call for Proposal issued by MPEG in 2017, conducted the committee to start building the technical specifications for this coding approach with the target to publish the final standard early 2020.
The MVD video coding technologies for MPEG-I are under exploration in the MPEG Video Group, while PCC technologies are studied in MPEG 3DG (3D Graphics Group). Both types of technologies are grouped under the MPEG-I umbrella since they contribute to the common goal of addressing immersive applications.
The two subgroups, however, have historically started their activities independently of each other, using their own data sets and Common Test Conditions (CTC), but we will see in the remainder of the paper that cross-fertilisation has led to technologies showing stunning similarities.
Both, the MPEG-I Video and MPEG-I Graphics coding technologies, are even expected to reach similar bitrates of around a couple of hundreds of Mbps for high-end Cinematic VR/AR productions, irrespective of the technological specificities of the proposed coding approaches.
The coding technology choices will hence merely depend on the workflows (purely video vs. computer graphics based special effects) of the industrial players and their immersive product features (3DoF+ versus 6DoF) flooding the market.