Using open-source software, Collabora has developed a compression pipeline that enables a face video broadcasting system that achieves the same visual quality as H.264 while using a fraction of the bandwidth. Our pipeline uses a speech-to-text model to transcribe the audio feed. A generative text-to-speech model is used to recover audio from the text on the receiver side, followed by a lipsyncing model to reconstruct the face with the generated audio. This enables communication at lower bitrates in remote terrains with limited bandwidth, and frees bandwidth for error correction during broadcasting. We’ll present the pipeline, and it’s use-case for the broadcasting world.
Marcus Edel, Software Engineer - Collabora