IBC2022: This Technical Paper explores 6 DoF used in VR applications.

Abstract

6 Degrees of Freedom (DoF) are used in Virtual Reality (VR) applications to enhance the user experience compared to the standard 3 DoF solutions. Due to its sparse nature, 6 DoF information is typically represented in a point cloud form, where each element describes the position of a point in the 3D space, as well as its attributes (e.g.,colour and transparency). Although it enhances user experience, 6 DoF requires a higher volume of data compared to 3 DoF, which has made content distribution challenging and has also limited its applications to high-end specialised machines. The aim of our work was to design a novel point cloud compression scheme to allow 6DoF VR applications to run in real-time on high-end consumer devices, such as gaming laptop and desktop machines. Although our solution was designed specifically for the PresenZ 6 DOF VR movies format, it may be easily applied on other volumetric video formats as well.

Introduction 

In a typical Virtual Reality (VR) scenario, Degrees of Freedom (DoF) are used to track the motion of a headset-wearing user within a three-dimensional (3D) space and adjust accordingly the image that the user views. 3 DoF applications track only rotational movement around the x, y, and z axes (known as pitch, yaw, roll), while 6 DoF applications also track translational movement (surging, swaying, heaving), allowing for additional effects, such as moving forward/backward, left/right, and up/down. In addition to enhanced user experience, 6DoF VR can help reduce motion sickness and feelings of disorientation, by providing a better sense of presence.

Due to its sparse nature, 6 DoF information is typically represented in a point cloud form, where each element describes the 3D position of a point, as well as its colour, transparency, orientation, and motion. It may also contain additional data, such as information about the camera(s) used to capture the 3D view. The actual number of points depends on the complexity of the visual scene: a typical frame may consist of over 5 million points.

Although it enhances user experience, 6 DoF requires a higher volume of data compared to 3 DoF, which has made content distribution challenging and has also limited its applications to high-end specialised machines. The key challenges that one needs to address are: 1) high data entropy, which typically exceeds the capacity of conventional communication channels, such as the 500 MB/s of Solid-State Drives (SSD), and 2) real-time video rendering requirements at relatively high frame rates (30 fps). In this work, we describe our approach towards addressing the above challenges using a novel data compression scheme, designed specifically for point cloud datasets.

Our data compression format describes each frame individually, and consists of a fixed header layer, as well as several optional data layers. The fixed header layer describes basic information, such as the number of points and the used color space, as well as the types of coding tools and techniques used for various point cloud subgroups and their attributes. Depending on the information included in the fixed header, additional header layers may be present in the bitstream, further describing encoding methods, parameters, and metadata. Finally, additional core layers are used to store the encoded values for each attribute.

We also designed and implemented a codec API, that allows encoding of a series of point cloud frames anddecoding it in real-time on high-end laptops and gaming desktop machines. Our actual encoder and decoder implementations were developed in C++, utilising techniques such as multi-threading and IntelTM Single-Instruction-Multiple-Data (SIMD) intrinsics.

This paper discusses background work for point cloud compression and VR applications, then describes our approach in detail, our experimental results, conclusions and discusses potential further developments.