Interview: Paul Debevec, Google

No comments

Paul Debevec Google interview index

Paul Debevec, Google

In Steven Spielberg’s movie Ready Player One there’s a shot of actor Ty Sheridan putting on virtual reality (VR) headgear which transitions imperceptibly from real to virtual cameras as the shot moves to an extreme close-up. In Gemini Man, Will Smith’s digital double is among the most realistic yet created for the screen.

Both instances made use of a Light Stage facial scanning system at Google and is just one of a number of breakthrough applications led by Paul Debevec, a senior scientist working in the company’s immersive computing wing.

A pioneer in image-based rendering who directed experimental short The Camponile in 1997 using photorealistic animation techniques adopted by the makers of The Matrix two years later, Debevec was named one of the top 100 innovators in the world aged under 35 by MIT in 2002. He’s been working with Google since 2015 as well as being an adjunct professor at the USC Institute for Creative Technologies in Los Angeles.

IBC365 caught up with Debevec at the VIEW Conference for visual effects in Turin where he presented Google’s latest efforts to capture and process light fields for a more realistic sense of presence in VR.

“Filming in 360-degrees only captures one perspective on how different materials react to light,” he says. Light fields can give you an extremely high-quality sense of presence by producing motion parallax and extremely realistic textures and lighting.

“We need to replicate how the world reacts to you as you move your head around and there are clues to this with how light bounces off surfaces in different ways.”

VR at a crossroads
It is not, however, a great time to be in consumer VR. The BBC has just disbanded the team it created to make VR content, Disney and Sky-backed VR tech venture Jaunt was recently sold to Verizon and Google has halted sales of its Daydream View smartphone headsets.

Debevec believes VR is still “on the incline” but admits it was hyped out of proportion.

Google lightfield VR Oculus

Google light field through Oculus VR

“So over-hyped that [Google] pulled me and my group out of our environment at the University. For a moment it looked like VR had potential as a new and interesting media and that it would become a platform that, if you were not on it, you would miss the boat. That kind of mindset gets a big tech company to throw people and resources at something.”

He says the main concentration in the tech industry now is on augmented reality (AR) but flags that it’s another instance “where the VPs and execs see it both as an opportunity with great potential and a risk that they’d miss the boat if they don’t get involved.”

There is a quality problem with VR which Debevec is trying to solve.

Read more: BBC shuts down VR Hub

“Users are presented with a stereo view in any direction. If your head moves, the whole image comes with you. In effect, your whole perceptual system is attached to the world and that causes nausea.”

He says: “If you want to create a great virtual experience that takes advantage of 6 degrees of freedom (6 DoF), we need to record not just two panoramas but an entire volume of space that is able to be explored interactively as you move your head around.”

Light field is the answer. It’s a means of capturing the intensity and direction of light emanating from a scene and using that information to recreate not only the volume of the space but subtle light changes, shadows and reflections.

A very brief history of light field
The idea goes as far back as motion picture’s founding father Eadweard Muybridge who, in 1872, recorded subjects moving sequentially in still images.

A hundred years later, another array of cameras was used to take images of a subject simultaneously, combined into a time-slice and used to create synthetic camera movements.

“Light field rendering allows us to synthesise new views of the scene anywhere within the spherical volume by sampling and interpolating the rays of light recorded by the cameras on the rig”

Deployed first on film in Wing Commander and Lost in Space then, ironically, on The Matrix, virtual camera techniques have become increasingly sophisticated.

“Light field rendering allows us to synthesise new views of the scene anywhere within the spherical volume by sampling and interpolating the rays of light recorded by the cameras on the rig,” he says.

Under Debevec’s direction, Google has built a number of light field camera arrays. These include a modified Odyssey Jump called Oddity which consists of 16 GoPros revolving in an arc and triggered to take photographs synchronously.

“Absolutely the key concept of light field rendering is that once you record all the rays of light coming into that sphere (scene) you can use the pixel values and the RGB values of each image to create images from different perspectives and views where you never actually had a camera,” he explains.

“By sampling or interpolating information from the hundreds of recorded images, you can synthetically create camera moves moving up and down forward and back – every view you might want to view in a VR headset with 6 DoF.”

Google light field technology

Google light field technology on show

Test shoots included one aboard NASA’s Discovery command module at the Smithsonian Institute’s Air and Space Museum.

Google focused on static scenes first, partly so it could work with relatively inexpensive camera rigs and also to perfect techniques required to create the best image quality.

When light field camera maker Lytro folded last year with Google in pole position to acquire its assets, it was Debevec who decided not to pursue development.

Rather than camera arrays, Lytro had built single body video cameras with dozens of micro-lenses including a cinema camera that was the size of a small car.

“That should be in a museum,” Debevec says. “The main drawback of Lytro’s system was that its spatial resolution was decimated by the lens array,” Debevec says. “If they had an 11-megapixel sensor the output resolution would only shoot 1k x 1k images.”

Light field video experiments
When Google turned to video, they retained the camera array arrangement but needed even higher quality machine learning algorithms to generate interpolations.

This is what Google’s computer vision experts have advanced with a machine learning process it calls DeepView.

“DeepView gives quite high quality viewing interpolations using an ML technique,” he explains. “It’s not depth maps plus geometry but volume with RGB alpha output.”

In a first test, it modified the Oddity rig into one called Iliad using 16 GoPros to generate 100 depth points of RGB alpha. With this data, they were able to generate synthetic camera moves around such ephemeral elements as smoke and fire, as well as recreating realistic reflections and specular light formations.

“It’s not completely artefact free but it blew our minds,” Debevec says.

Its latest light field camera array is its largest yet. The Sentinel comprises 47 x 4K action sports cameras capable of capturing a 120 x 90-degree field of view.

One application is as an aid for postproduction effects including camera stabilisation, foreground object removal, synthetic depth of field, and deep compositing.

“Traditional compositing is based around layering RGBA images to visually integrate elements into the same scene, and often requires manual artist intervention to achieve realism especially with volumetric effects such as smoke or splashing water,” he says. “If we use DeepView and a light field camera array to generate multiplane images it offers new creative capabilities that would otherwise be very challenging and time-intensive to achieve.”

At its offices in Playa Vista, Google has also built a larger volumetric light stage capable of scanning the whole human body, not just the face. It’s one of a number of such capture stages springing up around the world. Hammerhead VR operates one based on Microsoft technology in London. Paramount and Intel have built one in LA covering 10,000 sq ft, the world’s biggest, ringed with 100 8K cameras.

At Google, experiments continue with DeepView including the recording light fields of Google staff performing various simple movements, then using machine learning to render them into entirely new scenes, complete with detailed illuminations that match the new environment.

There are problems, though, in building the technology out to capture larger volumes.

“We wish we could take you all around a room in a light field but we’d have to move the camera to different parts of the room then find a way of linking images captured from each position. Just managing the amount of data is still daunting at this point. We’d have to ask machine learning to step in and help us.”

He is sceptical of holographic displays although believes the technology will advance.

“Any solution to this needs to have an extremely high pixel density,” Debevec says. “We may have hit the limit of human vision for conventional displays, so is there enough market to create 1000 pixel per inch (PPI) displays let alone 5000 and 10,000 PPI displays that will allow you to use the pixel surplus to output arrays of light in omnidirections?”

“We need to figure out how to tell stories in immersive filmmaking”

Editorially too, Debevec thinks there’s a lot of learning to do for VR to become as compelling an experience as cinema.

“We need to figure out how to tell stories in immersive filmmaking. Once you fill a user’s whole field of view so that they can see everything all the time and you take away the directed ability to zoom in on close-ups, you are giving them a lot of extraneous information.

Read more: Understanding immersive realities

“It would be like reading a novel where between any line there would be a whole paragraph describing what is going on in the rest of the scene. You would lose the story, become confused. The danger for immersive media is that it doesn’t focus the user’s attention.”