AES67 is a new industry standard for interoperability of high quality audio over IP networks from the Audio Engineering Society, published just under two years ago (September 2013).

This standard was quickly embraced by all of the main broadcast audio equipment vendors, and compatibility modes announced by all of the major competing networking audio vendors: Livewire, Q-LAN, Ravenna and Dante.

Outside of broadcast, there has also been a high level of audio industry acceptance.

AES67 specifies the method for carrying uncompressed 24-bit linear audio over layer 3 IP networks.

There are options and choices of sample rates, packets sizes, number of channels and bit depths, but a strict interoperability requirement is made so that all vendors must implement at least the one common set of parameter choices.

This requirement is what produces the interoperability between all vendors labelling their equipment AES67.

The technical details of AES67 are readily available. This paper examines the features of the design of AES67 that enable it to be the platform for the convergence of working with audio, telecom, studio and intercom.


At the present time, audio technology is leveraging audio over IP technology at the basic network transport level, but is not taking advantage of all the benefits that are possible.

Today, the audio in telephony, studio audio and intercom use network technology rather like computers were using networking technology in the 1990s.

We have managed to get the number of cables down to one, but audio applications are using many different protocols on that same one wire.

The workflows, user interfaces, and mental models of how we use voice, sound, music, effects, communication, and languages for telephony, for studio and for intercom, are separate.

We think of these as somehow different, separate, and requiring unique equipment and different user interfaces and operating sequences. But these all have that one thing in common: they are audio. They are sound. There is a fundamental commonality.


The Audio Interconnection Situation Today

Where radio has already adopted standards-based Audio-over-IP, making it a de-facto part of everyday life for literally thousands of users throughout the world, television has been much less willing to step up and embrace what is already an established technology, favoring the development of baseband signal transport and even proprietary Ethernet protocols.

AES67 is a natural evolutionary step for TV to take in order to leverage the many advantages that network-based infrastructure provides.

To gain wide acceptance for AES67, the industry has to take a look at the inefficiencies of the systems in use today and begin to understand how AoIP can replace what has gone before, and how it will enhance operational practices.

The audio system at the heart of any television broadcast facility has often been considered a necessary, yet less well regarded relative of video.

Conversely, the complexity of the audio and communication systems are generally acknowledged as being inversely proportional to that of the arguably more simple vision infrastructure.

In part, this is down to the sheer numbers of connections used to make audio work compared to video, but in reality, the true complexity of audio and communication has evolved in sympathy with gradually changing production workflows and the increased expectations of the program makers, engineers and technicians who create television.

For many years audio and video signals were kept apart, with separate systems used to acquire sound and vision to mix, edit and ultimately transmit.

The birth of recordable video tape provided a means to store, transport and broadcast pictures and sound via a single unifying format, but linear emission of signals still relied on individual analog video and audio routing and distribution infrastructures.

It took until 1989 for the first standardized version of Serial Digital Interface (SDI), in the form of SMPTE 259M, to introduce the concept of embedded audio which could be transported with the video signal along a single piece of cable. 26 years later, the majority of broadcast television ecosystems are still built around an embedded audio infrastructure.

For the many instances of audio within the system where no video signal is present, such as microphone circuits, monitoring, tie-lines, audio playback and recording devices, cues and communications, many broadcasters still rely on baseband connectivity using analog, AES3 or MADI, with the associated overhead of copper cable, fibers, connectors and patching that these forms of signal entail.