IBC2023: This Technical Paper provides an overview of MPEG-I Scene Description.


Immersive media applications offer experiences which immerse the user in a virtual or hybrid environment and offer more degrees of freedom than with traditional 2D video content. Platforms providing immersive media often enable the user to interact with the content and/or with other users in shared virtual or mixed reality spaces.

To address the need for an interoperable cross-platform exchange format and interactive solution for such 3D environments, ISO/IEC JTC 1/SC29/WG03 MPEG Systems has standardized a Scene Description framework in ISO/IEC 23090-14 [1], that serves as an entry point format to compose rich 3D scenes, referencing and positioning 2D and 3D assets in the scene, blending with the real world, rich interactivity and providing real- time media delivery.

Carriage formats have also been defined for the delivery of the Scene Description data and of the linked assets, based on the well-known and ubiquitous ISOBMFF standard i.e., ISO/IEC 14496-12 [2].


Immersive media applications offer experiences, where the user is immersed into virtual or hybrid environments. The user is able to experience the content in 3D and enjoy more degrees of freedom compared to traditional 2D content. Platforms providing immersive media also often give the user the ability to interact with the content and/or with other users, in shared virtual spaces.

Immersive media is becoming increasingly prevalent and start to influence the way we work and entertain ourselves. Immersion is achieved by introducing the depth dimension in media modalities (visual and auditory) traditionally digitally expressed in a 2D fashion. The trend of transition from 2D to 3D media was initially started by Virtual Reality (VR), mainly driven by the availability of affordable VR headsets. However, unique Augmented Reality (AR) and Mixed Reality (MR) immersive experiences are also becoming popular, supported by the release of consumer devices, such as see-through Head Mounted Displays (HMD) and glasses. A number of immersive experiences are also achievable on smartphones.

One of the key technologies in enabling immersive media user experiences is a scene description. Scene description defines the structure and composition of a 3D scene, referencing and positioning the 2D and 3D assets in the scene, and provides all necessary information that can be used by an application to render the 3D scene properly to an end- user.

The need for a solution to enable cross-platform exchange and interaction in 3D environments became evident and a number of forums and Standards Developing Organizations (SDOs) started to define the needed technology. ISO/IEC Moving Picture Experts Group (MPEG) Working Group 3 (WG03) defines a scene description framework in part 14 of the MPEG-I series of standards (i.e., ISO/IEC 23090-14), serving as an entry point to rich 3D dynamic and temporal scenes, enabling immersion, fusion with the real world and rich interactivity, while providing real-time media and scene update delivery.

Furthermore, the standard defines an architecture together with an application programming interface (API), that allows the application to separate access to the immersive timed media content from the rendering of this media. The separation and the definitions of this API allow the implementation of a wide range of optimization techniques, such as the adaptation of the retrieved media to the network conditions, partial retrieval, access at different levels of detail, and adjustment of the content quality.

This article provides an overview of MPEG-I Scene Description (MPEG-SD) and it is organized as follows.

The first section describes the architecture framework utilized in MPEG-SD, which is followed by a section describing all the new features introduced by the first edition of the standard and its amendments. After that we provide information of the storage and transport aspects related to the MPEG-SD. The last three sections look at future standardization projects related to MPEG-SD, future work, and conclusions.

Download the paper below.