OMAF4CLOUD: Standards-enabled 360° video creation as a service

No comments

Abstract

Omnidirectional MediA Format (OMAF) is a media standard for 360° media content developed by the Moving Picture Experts Group (MPEG). More complex and tailored multimedia services are needed for advanced media processing and delivery such as virtual reality content stitching, packaging, and adaptive streaming. To achieve the desired result, these complex workflows require many advanced functionalities to work together on the media content. In order to address the needs of advanced services, MPEG is also developing a new standard called Network-based Media Processing (NBMP), a standard that aims at increased media processing efficiency, faster and lower-cost deployment of interoperable media processing functions and the ability to provide large scale deployment by leveraging the public, private or hybrid cloud services. This paper covers both OMAF and NBMP standards. Additionally, an end-to-end design and proof of concept is provided to enable an immersive virtual reality experience to the end-users.

Introduction

Omnidirectional MediA Format (OMAF) is a systems standard developed by MPEG. OMAF defines a media format for omnidirectional content with three degrees of freedom (3DOF) such as 360° video, images, audio and timed text. OMAF also supports viewport-dependent streaming, where a user’s viewport is transmitted with higher picture quality than the remaining areas of the viewing sphere. Moreover, the next version of the standard is under development, enabling standardised features for multiple viewpoints and media overlays. 

Converting viewport-agnostic 360° video to OMAF compliant content requires file format and transport protocol level modifications only (e.g. fragmented MP4 and DASH based streaming). However, adaptive bit-rate (ABR) is needed for real-world use cases, requiring several encoded versions of the video. Furthermore, 360° videos need to be transcoded to enable viewport-dependent operation, which puts certain constraints on the video encoding process. This requires several transcoding instances to run in parallel during OMAF compliant content creation.

Video processing is a computing resource-intensive process. Traditionally content providers used dedicated in-house hardware to transcode their content. Such an approach may introduce high capital expenditure related costs and provide limited scalability.

Network-based solutions are more scalable in terms of the computing resources and provide remote access to users, for instance, over the web, as well as programmable APIs for various integration needs.