Independent audio routing, or SDI audio breakaway, is a standard aspect of today’s TV production workflow and is functionality that will need to be implemented in IP as the industry transitions away from SDI.

Fortunately the existing AES67 standard for audio over IP meets this objective and eliminates the need for the industry to reinvent the wheel.

Not only is there already AES67 equipment deployed in the audio industry, but using this standard also enables significant new workflow opportunities.

This paper provides an overview of AES67 and explores how it can be used with SMPTE 2022-6 and VSF TR03 uncompressed video to form a complete solution.

AES67 uses the IEEE-1588 PTP timing standard, and combining AES67, IEEE-1588, SMPTE ST 2059 and SMPTE ST 2022-6 using the new VSF Technical Recommendation 04 (TR04) provides a solution for maintaining A/V alignment throughout the production workflow.


In today’s TV production environment, audio can take two forms: embedded audio and independent audio.

Production environments often mix the two – embedded audio on video sources, alongside separate audio for audio-only devices.

As the industry transitions to IP infrastructure, each of these models still has applications, and the industry will need to support both.

With embedded audio, the audio is carried with the video. In SDI, the audio is literally carried in the ANC space of the video.

From a routing, management, timing and audio/video alignment point of view, this is a simple model.

The IP extension of this model is SMPTE ST 2022-6, wherein the audio is embedded in the video and the video with audio is transported via IP.

With independent audio or SDI audio breakaway, the audio is not routed with the video.

This model allows the audio to be routed and processed independently from the video, for example, routing the audio to and from an audio production console.

While this model increases flexibility, it also increases complexity for routing, management, timing and A/V alignment.

The AES67 standard offers one method of implementing audio breakaway in the IP domain.

AES67 transports audio channels over IP in separate audio streams, which can be separately routed within the IP routing fabric.


Historically, there have been many different standards to choose from when it comes to implementing audio over IP, with some of the best publicized being Ravenna, Dante, LiveWire, Q-LAN, WheatNet-IP and AVB.

While many of these standards are similar, each is also a little different.

To improve interoperability within the industry, the Audio Engineering Society (AES) developed AES67 “AES standard for audio applications of networks – High performance streaming audio-over-IP interoperability.” (1).

This was first published in 2013 and updated in 2015.

At a high level, AES67 creates RTP IP packets with PCM audio samples.

There are no additional headers or overhead.

Key components of AES67 are:

  • Synchronisation
  • Transport
  • Encapsulation and Streaming
  • Session description

AES67 Synchronisation

Synchronisation between all the AES67 transmitters and receivers is accomplished using the IEEE 1588-2008 Precision Time Protocol (PTP) (2).

PTP distributes a precision time using IP to all the devices in the network.

Using this time, transmitters and receivers generate locked media clocks, which are used for sampling the input audio and generating RTP time stamps.

The flexible PTP standard allows the use of profiles for different industries and applications.

Profiles restrict parameters such as the rate at which time messages are sent.

AES67 allows for the IEEE default profile and an AES67 profile.

Since all the AES67 devices in the network are locked together, audio sample rate conversion (SRC) is not required between the SES67 devices.

SRC may be required for unlocked input signals.

AES67 Transport

Transport describes how the encoded media data is transported across the network.

AES67 is transported using Real-time Transport Protocol (RTP) over User Datagram Protocol (UDP) over IPv4.

AES67 allows for both unicast and multicast; however, multicast is the normal mode for the professional broadcast market.

For multicast audio essence and PTP messages, AES67 devices are required to support IGMPv2 and may support IGMPv3.

AES67 Encapsulation and Streaming

The AES packets are created by concatenating the PCM audio samples together.

AES67 allows for:

  • Sampling: 44.1KHz, 48KHz and 96KHz
  • Bit width: 16-bit (L16) and 24-bit (L24)

AES67 packets have constant time spacing between packets.

This is called the “packet time”.

AES67 defines a range of packet time intervals, and the recommended values are 12us, 250us, 333us, 1ms and 4ms.

Other values are permitted.