MPEG is promoting a video-based point cloud compression technology – and Apple is driving it.

At its most recent meeting, at the beginning of October in Macau, standards body MPEG upgraded its Video-based Point Cloud Compression (V-PCC) standard to Committee Draft stage.

MPEG met in Macau in October

MPEG met in Macau in October

V-PCC addresses the coding of 3D point clouds – a set of data points in space - with associated attributes such as colour with the goal of enabling new applications including the representation of human characters.

In other words, avatars or holographs existing as part of an immersive extended reality in the not too distant future.

“One application of point clouds is to use it for representing humans, animals or other real-world objects or even complete scenes,” explains Ralf Schaefer, Director Standards at Technicolor Corporate Research.

In order to achieve decent visual quality, a sufficient density of the point cloud is needed, which can lead to extremely large amounts of data. Understandably that’s a significant barrier for mass market applications – hence the demand for a workable lossy or lossless means of compressing the information.

Xtended Reality
V-PCC is all about six degrees of freedom (6DoF) - or fully immersive movement in three-dimensional space - and the goal which Hollywood studios believe will finally make virtual and blended reality take off.

Limitations in current technology mean Virtual Reality is restricted to three degrees of freedom (3DoF).

Companies are already switching their attention from VR to focus on augmented reality, mixed reality or in the new jargon, eXtended reality (XR).

For example, VR pioneer Jaunt, in which Sky and Google are investors, is jettisoning VR camera development to focus on its XR mixed reality computing platform. Jaunt recently acquired Chicago-based Personify maker of a volumetric point cloud solution called Teleporter.

Apple has the most extensive AR ecosystem with which it is leading this field. Its Augmented Reality kit targets developers wanting to create AR experiences viewable on iOS devices.

It is positioning itself as the destination for AR and blended reality experiences for the time when the iPhone, and smartphones in general, are superceded by a form of wearable goggles as the consumer interface for communication, information and entertainment. Microsoft (Hololens), Google (Glass using AR toolset Project Tango), Facebook (redirecting its Oculus VR headgear team toward development of AR glasses) and Magic Leap are among competitors for this next stage of internet computing.

It should come as no surprise then that Apple’s technology is reportedly the chief driver behind MPEG’s V-PCC standard.

“The point cloud solution that MPEG has selected is the one proposed by Apple,” confirms Thierry Fautier, president-chair of the Ultra HD Forum and video compression expert.

Using existing codecs
MPEG is actually investigating two approaches to compressing point clouds. The other is based on geometry (G-PCC) which uses 3D geometry orientated coding methods for use in vehicular LiDAR, 3D mapping, cultural heritage, and industrial applications.

What is important about the V-PCC initiative is that it attempts to leverage existing video codecs (HEVC is the base application although this could be swapped out for other codecs if they are proved more efficient), thus significantly shortening time to market.

“As the V-PCC specification leverages existing [commodity 2D] video codecs, the implementation of V-PCC encoders will largely profit from existing knowledge and implementation (hardware and software) of video encoders,” explains Schaefer.

The V-PCC specification is planned to be published by ISO around IBC2019 (Q3 2019) so the first products could be in the market by 2020.

“The latest generation of mobile phones already include video encoders/decoders that can run as multiple instances and also powerful multicore CPUs, allowing the first V-PCC implementations on available devices,” says Schaefer.

Already the current V-PCC test model encoder implementation would provide a compression of 125:1, meaning that a dynamic point cloud of 1 million points could be encoded at 8 Mbit/s “with good perceptual quality” according to MPEG.

Says Schaefer: “This is essentially achieved by converting such information into 2D projected frames and then compressing them as a set of different video sequences by leveraging conventional video codecs.”

It is the relative ease of capturing and rendering spatial information compared to other volumetric video representations which makes point clouds increasingly popular to present immersive volumetric data.

“A point cloud is a collection of points that are not related to each other, that have no order and no local topology,” explains Schaefer. “Mathematically, it can be represented as set of (x,y,z) coordinates, where x,y,z have finite precision and dynamic range. Each (x,y,z) can have multiple attributes associated to it (a1 ,a2, a3 …), where the attributes may correspond to colour, reflectance or other properties of the object or scene that would be associated with a point.”

He continues, “Typically, each point in a cloud has the same number of attributes attached to it. Point clouds can be static or dynamic, where the latter changes over time. Dynamic objects are represented by dynamic point clouds and V-PCC is being defined for compressing dynamic point clouds.”

Capturing point clouds
Point clouds are generally well suited for 6 DoF immersive media applications as free view point is natively supported and occlusions are avoided. On the capture side, point clouds are usually deduced from depth and/or disparity information from multi-view capture.

This includes lightfield systems such as the rigs of multiple GoPro cameras being tested in Google’s research labs.

“In current lightfield camera systems there is always a limitation in the number of cameras or in the number of micro-lenses when considering plenoptic cameras [such as the one developed at Lytro],” says Schaefer. “Whether it is calculated or measured, depth information can be associated to the texture acquired by the camera. As soon as there is texture and depth, the subsampled lightfield can be represent by a point cloud.”

V-PCC is part of MPEG-I, a broader suite of standards under development all targeting immersive media. Compression will reduce data for efficient storage and delivery which is essential in future applications. MPEG has also started work on the storage of V-PCC in ISOBMFF files which is the next step towards interoperability of such immersive media applications.

“In my opinion 6DoF will not be a consumer application [initially] but more a business enterprise application,” says Fautier. “Development will take time. There will also probably be need for cloud computing to assist the heavy computations. So, for me, 6DoF is an application we’d expect to see over fibre and 5G beyond 2020. After which, the sky is the limit.”