Object-based audio promises immersion, accessibility, personalisation and interactivity, but fears of a format war persist.
The audio channel has been the foundation of sound recording and playback since the days of the phonograph. From those mono beginnings the number of channels has risen to produce stereo, multi-track recordings and surround sound formats such as 5.1.
Engineers and developers made the most of this framework but there was always the thought that more could be done, not only for realistic recreation of how humans hear but also alternative languages and commentaries.
Which is why object-based audio (OBA) has been focusing the minds of recording engineers, software developers and equipment manufacturers in the last five to six years.
OBA pushes the concept of non-linear production further by taking the various audio components of a programme - music, dialogue, sound effects, commentary - and converting them into ‘media objects’. Accompanying metadata is created to dictate where the objects should appear in the playback and giving the ability to change the balance between different elements or access specific functions. This allows receiving devices such as set-top boxes (STBs) and smart TVs to re-assemble the objects in the home.
The central concept offers four specific features: immersive audio, accessibility, personalisation and interactivity. Immersive or spatial sound is the most widely accepted implementation of OBA. Earlier attempts to emulate human hearing, with a sensation of height as well as width and depth, included Ambisonics, developed in the early 1970s by Michael Gerzon.
Ambisonics is still among the options for immersive audio production but the best-known spatial format today is Dolby Atmos.
Introduced in 2012 initially for cinema, Atmos has a core of channels (7.1.2) with up to 118 objects. It is now available on Blu-ray Disc (BD), features in live TV sports coverage from BT Sport and Sky and has been built into AVRs (advanced video recorders) and soundbars.
Accessibility covers functions such as audio description, which was an earlier manifestation of object-based media because it was additional content that was an option for video and audio on the receiver.
OBA allows accessibility to be taken further through the personalisation of settings. In this way the balance between dialogue and music/effects in a TV drama can be altered according to the viewer’s taste or needs. Personalisation also extends to where and on what kind of devices a person is watching or listening.
As well as Atmos for immersive audio, Dolby also has AC-4, which covers features such as dialogue enhancement and alternative language and commentary tracks. This is included in the DVB toolbox as part of Next Generation Audio (NGA) for Ultra HD broadcasting.
The second option for DVB NGA is MPEG-H, which offers a mixture of channels, objects and scene-based technologies. A third option is being added this year in the form of DTS/Xperi’s object-based audio codec, DTS:X.
Despite all three systems often being included in major technology specifications, there is still the possibility of a format war.
Efforts have been made since the launch of Atmos and DTS:X to brings developers and manufacturers together to promote the implementation of OBA. These include the Orpheus project, which is funded by the European Commission and counts Fraunhofer IIS, developer of MPEG-H, BBC R&D, the IRT, Bayerischer Rundfunk and Trinnov Audio among its participants. Orpheus was set up to run for 30 months and is due to end this May with a meeting at the IRT.
The EBU is also actively promoting OBA and recently held a Production Technology Seminar on the subject in Geneva. Among those involved was Matthieu Parmentier, R&D Projects Manager at France Télévisions and chair of the EBU FAR (Future Audio and Renderers) group.
Parmentier says some of the groundwork for adoption of OBA is already in place, with object-based codecs and encoders, as well as Dolby Atmos/MPEG-H equipped soundbars, now on the market. “But we have to deal with local adoption of rendering and decisions need to be taken by the premier broadcasters,” he comments.
Parmentier adds that France Télévisions is now looking at OBA, with dialogue enhancement the main focus. He agrees that standards are the biggest problem, particularly in terms of preparing masters.
To find a solution France Télévisions has been working with BBC R&D to develop an interoperability format. The ADM (Audio Definition Model) has the support of Fraunhofer, Dolby and DTS; the aim is for it to be standardised by the ITU, which has designated it ITU-R BS.2076-1.
Among the manufacturers already incorporating ADM into its products is Merging Technologies. The company’s head of software engineering, Dominique Brulhart, explains that the Pyramix digital audio workstation is able to handle a “complete multi-language, multi-version and object-based workflow”, with metadata exported to both ADM and MPEG-H.
Avid has gone the Dolby Atmos route for its Pro Tools DAW, with tools for the immersive system included in version 12.8.
“We partnered with Dolby to integrate Atmos workflows into Pro Tools,” comments Connor Sexton, Senior Product Designer at Avid Technology.
“This gives us 7.1.2 for the main stem and also panning and interactivity using the Dolby renderer, which assigns the objects.”
Sexton adds that Pro Tools’ OBA capability will be extended to ADM in the future, while the DAW already offers higher order Ambisonics and virtual reality audio.
Authoring systems for OBA will be key in both the production of audio for emerging TV technologies, such as UHD, and how the information is dealt with in the home.
Two different MPEG-H tool sets have been introduced by Linear Acoustic, part of the Telos Alliance, and Jünger Audio.
Linear Acoustic’s AMS (Authoring and Monitoring System) was developed with Fraunhofer and is able to handle immersive formats, as well as monitoring interactivity and providing quality control functions. The inputs/outputs (I/Os) are AES67, which John Schur, President of Telos’ TV Solutions Group, says can connect to SDI and AES3 using xNode IP interfaces.
Schur acknowledges that OBA is bringing new challenges for audio production, not least with metadata. “It’s a whole other layer of metadata,” he says. “It goes way beyond what is used for something like loudness.” While saying the potential format war is also challenging, Schur is less concerned about this: “It’s easier to support multiple formats with software and there are common user interfaces.”
Peter Pörs, Managing Director of Jünger Audio, which has the MMA (Multichannel Monitoring and Authoring) unit, sees the codec as critical, with pressure on manufacturers to bring in the right technology. “Delivering OBA to the home requires a codec in place that supports the delivery of individual objects and decoder-based mixing,” he says. “For the production side the big challenge will be to accept not having the final mix in hand any more.”
Fraunhofer has produced its own MPEG-H authoring tool, which produces metadata and scene description.
Technology consultant Stefan Meltzer comments that several third party plug-ins are now available to generate metadata for the MPEG-H TV audio system, including Spatial Audio Designer and the DSpatial 3D Audio tool.
“The metadata for the MPEG-H audio system now describe the different elements of the mix, how they should be mixed as the default mix and which elements can be altered by the user and to what extent,” Meltzer says.
“It also includes loudness and DRC [dynamic range compression] data, which allow the adaptation of the audio to the capabilities of the reproduction device, meaning the same bit stream can be used on a home cinema and on a mobile device.”
OBA is starting to make its way into the broadcast and consumer markets. Korea is the first country to introduce a UHD TV system with MPEG-H, featuring an audio codec supporting channels, objects and Ambisonics. Public consultations on future TV services are underway in France and Spain, while the Nordic region is said to be discussing NGA.
This all looks encouraging for the take up of a new and different technology but Matthieu Parmentier adds a note of reality by saying full implementations of MPEG-H Audio and AC-4 in conjunction with UHD are still two to year years down the road.
“ADM will be standardised and we are developing a way of perceptive testing - the Multi-Stimulus Ideal Preferred Methodology (MS-IPM) - but we still need a rendering reference,” he concludes. “Another key thing will be console manufacturers creating free ADM metadata for live production.”