New VR (Virtual Reality) HMDs (Head Mounted Displays) being introduced in 2016 are creating increased demand for VR video content. A growing amount of content — including documentaries, movies and live events — have already been covered in VR video.

For the market to truly take off, some standardization is required with regards to the rules of content writing, the content acquisition and stitching methods, and the approach for mapping content for encoding and delivery. In addition, the industry needs to define a unified mechanism to address all of the different ecosystems, ranging from the various VR devices to mobile devices, STBs and connected TVs, to avoid the fragmentation that resulted with 3D and over- the-top (OTT) video delivery.

This paper will present reference architectures that can be deployed with existing technology to pave the way for future evolutions of VR.


VR represents an entirely new way for consumers to experience video. No longer is the TV viewer or game player a passive participant in the action; VR video simulates the experience of entering the video content itself, with the ability to see a full 360 degrees in any direction. The entertainment and educational possibilities afforded by the technology are “virtually” unlimited, and stand to change the way that video is produced, prepared and consumed for generations to come.

VR video can be thought of as a panoramic representation of content either captured on camera or generated via computer graphics, and then viewed on a 2D or 3D HMD. The workflow to create and deliver content includes production, encoding and transmission of audio, video and graphic elements.

The displays worn by VR video consumers are a key part of the VR ecosystem and come in a variety of form factors. They may be either tethered (e.g., Oculus Rift, HTC Vive, Sony PlayStation VR) or untethered (i.e., connected to a device wirelessly, such as Gear VR and LG VR) to a VR player or PC. They can also be fully self-contained 2D devices, such as Google Cardboard, in which the user views content from a smartphone.

This paper will focus on VR video content preparation (i.e., acquisition, processing, encoding, transmission). Audio, graphics and devices are an entire other subject.


VR, sometimes referred to as immersive multimedia, is a computer-simulated environment that can mimic physical presence in places in the real world or imagined worlds.

Virtual reality can recreate sensory experiences, virtual taste, sight, smell, sound, and touch, which include virtual VR video is a panoramic (180 or 360 degrees) video environment that is captured on a single or stitched multi-camera system and sent to a wireless HMD for an immersive experience or to a 2D device such as a PC, mobile device or TV set. Content can be consumed locally, streamed or broadcast.


VR video is a complete ecosystem that is still under construction. This section will provide a high level overview of VR video content creation. VR video content creation is composed of different steps. Some of them can be skipped depending on the solution, as described in Figure 1.

Content is captured via a camera rig, then stitched, then mapped to a variety of different geometries, then encoded in HEVC and transmitted using either broadcast or unicast mechanism.

Figure 1 vr content creation

Figure 1 VR Video Content Creation