Next Generation Audio (NGA) builds on existing audio codecs and architectures to deliver unprecedented personalisation, thanks in part to an object-based approach to audio assets. The MPEG-H Audio codec from Fraunhofer IIS in particular has seen a sharp rise in adoption, reports John Maxwell Hobbs…

Over the past two decades, a significant amount of attention has been paid to the next generation of video technology – primarily focussed on resolution: from SD to HD, HD to 4K, 4K to 8K. With the exception of the brief side-track of 3D TV, these technological advances did not change the essential experience of watching a moving image on a screen.

Yannik Grewe

Yannik Grewe, Fraunhofer

Audio is another story entirely. Beginning with the first stereo television sets in the 1980s, through the rollout of “home cinema” 5.1 speaker systems in the early 2000s, to the ubiquitous surround sound soundbars available today, the technological advances in broadcast audio have not only provided greater quality, but also a more immersive experience.

Audio companies are now engaged in the development of what is being referred to as Next Generation Audio, or NGA. More than simply adding additional channels, NGA is focussed on building upon existing technologies, and combining them with the significant processing power available in smart TVs, smart speakers, and smart phones, allowing the audience to create an audio experience that is personalised to them, wherever they are and whatever device they may be using.

NGA takes a new approach to audio streams. Traditionally, audio is mixed into discrete channels in the studio – two channels to provide the left and right of a stereo mix, and up to six discrete channels for a 5.1 surround mix. NGA takes a different approach, keeping the audio assets as separate objects and letting the viewer’s equipment create the final mix based on its capabilities.

Read more: Object-based audio production

MPEG-H Audio: Personalisation

Fraunhofer IIS has seen significant uptake for their NGA technology MPEG-H. The technology is included in the ATSC, DVB, TTA (Korean TV) and SBTVD (Brazilian TV) TV standards and used as the sole audio system in the world’s first terrestrial UHD TV service in South Korea. Brazil has selected it as the mandatory audio system for Brazil’s next-generation TV 3.0 broadcast service expected to start in 2024.

Yannik Grewe is a senior engineer for audio production technologies in the audio and multimedia division of Fraunhofer and has been involved in productions the company did using MPEG-H audio, which have included the Eurovision Song Contest, the FIFA World Cup, and Rock in Rio. He described some of the experiences.

“With the Eurovision Song Contest in 2018 in Lisbon and again in Tel Aviv in 2019 we had a parallel production next to the main OB van,” he said. “Besides the 5.1 immersive sound, we also focused on personalisation - the option for a user to select a preferred language, or to enhance the dialogue for better speech intelligibility. We let the users at home select the preferred level of a commentator, for example.”

More recently, Fraunhofer, together with FIFA delivered every game of the FIFA World Cup 2022 in Qatar with immersive sound and personalisation. “It was possible for the user to select a preferred commentator or completely switch off the commentator just to get a stadium atmosphere,” said Grewe.

“Something that is very well received in Brazil is the option to enhance the sound of ball kicks. And that’s all enabled by MPEG-H during production so the user at home can select their preferred audio experience.”

MPEG-H Audio: Viewer response

Engineers must be careful that they are not building a solution in search of a problem, and with that in mind, Fraunhofer has worked closely with broadcasters to ensure that MPEG-H is delivering value to the audience.

Grewe pointed to a collaboration with the BBC on a broadcast of the Wimbledon tennis tournament that allowed the audience to change the prominence of the commentators in the mix. “We ran a survey that asked them how they much liked this feature,” he said. “It turned out that the half of the people were happy with the mix as it was, and the other half wanted to change the level of the commentary. So, we could say that 50% were happy with the mix and 50% were not, but actually 100% of the people wanted to change it somehow, but in a slightly different way.”

Rupert Sq

Rupert Brun, Fraunhofer IIS

Rupert Brun, a technical consultant and the former Head of Technology for BBC Radio, works with Fraunhofer IIS and was involved in the test. “What was interesting was that roughly half the people wanted to turn the dialogue up,” he said. “Perhaps they had a hearing difficulty, but quite possibly they were consuming the content on public transport where it’s noisy. Or they were cooking while they were consuming it and weren’t concentrating on the screen. The other half wanted to turn the dialogue down.

“Because they wanted the immersive experience of being at Wimbledon, or they were concentrating on the match, they knew their tennis, they didn’t need commentary. So, what we discovered was that in that instance, the BBC were doing an extremely good job of producing a sound balance that was right in the middle of the range that people wanted, which almost nobody wanted.

“There have been surveys since where it’s invariably been shown that that it is this personalisation which people want much, much more than the immersion. Broadcasters receive thousands of complaints about dialogue audibility and right from the outset, we’ve been focused on fixing that one. And wherever it’s been tried, it’s been hugely popular.”

MPEG-H Audio: Adoption

The key to the success of any broadcast technology lies in its adoption by both broadcasters and manufacturers. “Obviously the South Korean manufacturers LG and Samsung were quick to implement MPEG-H, and the Sony 360 Reality Audio proposition is based on it,” said Brun. “Other manufacturers have now followed. But of course, there’s always a little bit of a chicken and egg in that manufacturers don’t want to do stuff if nobody’s broadcasting it, and people don’t want to broadcast stuff if there aren’t consumer devices for people to watch it.”

Brun believes that the widespread use of media apps on general purpose devices can help to speed the adoption of MPEG-H. “We’ve demonstrated that given a reasonably capable smartphone, it’s perfectly possible to implement decoding and playback personalization entirely in software in an app,” he said.

“You don’t need the device to have hardware chips built in to do MPEG-H, you can do it in software. And I think it’s important to note that all the devices that support MPEG-H will play any MPEG-H content. It’s not as if there’s only support for a subset of it - it’s universal. One of the great things about MPEG-H, that you can produce one version, and the consumer will get the best possible experience regardless of whether they’re using a home cinema system or a mobile phone, because the consumer device will render it into the best possible version for whatever they’re using.”

MPEG-H Audio: Tools

Production teams will need powerful and flexible authoring tools that can be integrated into existing workflows to create MPEG-H enabled content. “We have production tools developed by ourselves but also with partners,” said Grewe. “There is the MPEG H authoring suite, a set of several tools which we are offering from our website as a free download that includes plugins for all the major audio workstations and platforms. It also comes with a standalone tool to convert existing mixes from other systems into an MPEG-H stream.”

Rupert Brun

Fraunhofer is ensuring live production is supported by MPEG-H

MPEG-H is currently supported in DaVinci Resolve, and the company announced at this year’s NAMM Show that it will be incorporated in Steinberg’s Nuendo and Avid’s Pro Tools.

Fraunhofer is ensuring that live production is also supported. “We are working with professional hardware manufacturers such as Telos Alliance who support MPEG-H in their Linear Acoustic AMS Authoring & Monitoring System”, said Grewe. “There is a solution by Jünger Audio called MMA. And New Audio Technology, called the Spatial Sudio Design”.

Brun also stressed the fact that the technology can easily fit into familiar workflows. “The important thing is that those encoders will do on-the-fly changes in response to the metadata that’s embedded in the stream. We’ve got three different manufacturers making hardware for live and we’ve got all those different software platforms for post produced content,” he said.

“When you’re working in your production centre, if you’re still working with SDI, we can send the whole thing over SDI, we can send, for example, 15 channels of audio, and then the 16th channel, we send something that sounds a bit like timecode when you listen to it, but it’s all of the metadata. It’s very robust. You can put it through mixing desks, and it comes out the other end absolutely fine. And obviously, if you are working with an IP based workflow, that’s fine. We can do that too.”

Overall, it seems that MPEG-H Audio is rapidly reaching an adoption tipping point, where integration will begin to become the expected norm rather than the experimental cutting edge. It is certain that the need for increasingly immersive audio is on the rise, and that trajectory is unlikely to flatten, whether viewed from a broadcast, OTT or pure technology perspective.

Read more Dolby Atmos: Producing with object-based audio