Abstract

Producing live 6DoF video requires video capture with multiple cameras, real-time depth estimation, compression, streaming and playback. All of these components are under development and a ready-made solution is hard to find. To make the right choices during development there is a clear need to be able to predict in advance the effect that system parameters (e.g. baseline) and depth estimation algorithms have on image quality.

In this paper, I present a quality evaluation approach that uses ray-traced images of artificial scenes to simulate the acquisition for a given camera capture configuration. The images are passed to real-time depth estimation and view-synthesis software. Views are then synthesized for a pre-set viewing zone and the resulting images are compared with the ray-traced images. Modelling errors are isolated from depth estimation errors by comparing both the images synthesized from ground truth depth and images synthesized from estimated depth with the ray-traced images.

Introduction

With an increasing number of display devices supporting positional tracking and 3D interaction, the relevance of multi-camera capturing and 6DoF processing increases. Applications include live concerts, live sports and telepresence. The freedom of selecting one’s own viewpoint enriches these applications by increasing the feeling of presence over regular video. Further, into the future more immersive scenarios can be conceived where an observer may navigate and interact with a live captured scene. For broadcast applications, we need real-time depth estimation on the production side and real-time view synthesis at the client device. Both depth estimation and view synthesis introduce errors and these errors depend on the implementation details of algorithms. Furthermore, the optimal camera configuration depends on the intended application and the 3D structure of the scene being captured. In the next sections, I introduce a ray-tracing approach to quality evaluation inside a target viewing zone. The approach is evaluated using our real-time multi-camera setup for live broadcast.

Download the full paper below