Technical paper: This paper explores novel opportunities that open up beyond cost and time savings in workflow automation.

Abstract

Over the last decade, we have seen a steady demand for content on media platforms. However, producing quality content with attention to timing and relevance at scale is still a challenge. Simultaneously, developments in Artificial Intelligence (AI) research have enabled content generation such as text and voice with human-like quality for the first time.

In this work, we have automated the end-to-end production of a roundup newscast (a summary of current news). To achieve our goal, we have used AI and a life-like virtual character that delivers a news roundup. Our cloudbased system, named Aida, can either stream in real-time for traditional media channels and the web or generate videos for Video On Demand (VOD).

In the first stage of our pipeline, we used our in-house data-to-text Natural Language Generation (NLG) technology to describe the weather. We also built a text summarization engine to create short descriptions of full-length news articles automatically. After combining both texts into a script, we used text-to-speech based on deep-learning to create a natural-sounding voice with an emotional tone that matches the news content. The character’s lips were audio-synced automatically and in real-time by analysing the previously generated audio. The resulting render can be customized to target different audiences by selecting news categories, virtual scenes, and even display relevant advertising.

Introduction

Workflow automation has been traditionally a remedy for repetitive and laborious tasks. From this perspective, automation is seen merely as a time-saving tool. In this work, we explore novel opportunities that open up beyond cost and time savings when we have an automated pipeline. Drawing upon different approaches for media generation, we discuss opportunities and challenges on creating a news roundup program from scratch that is fully machine-generated.

Consuming automatically generated content is already pervasive in our modern daily lives, and it can come in different ways. Virtual assistants like Siri, Alexa, Cortana, and Google Assistant come to mind as people use them to ask for directions, order food, and other tasks. Many predict that this trend will continue as technology matures and fills new roles making people more reliant on their AI-powered assistants [1]. Surprisingly, for some applications, having a fully automated virtual human is preferred to having a human in the loop. That is the case of clinical interviews, where people are willing to be more open and relaxed when they believe the virtual human has no human operator oversight [2]. Also, a recent trend has virtual avatars gaining more space and acceptance on social networks, attracting millions of followers, and becoming advertising models for famous fashion brands [3].

That has led us to question if we could use a virtual human as a newscast anchor. By employing a virtual anchor, we can present a continuous information cycle that does not depend on human availability during business hours. Replacing anchors with virtual ones alone can cut production costs in half [4]. But perhaps, more interestingly, it opens up the possibility for unprecedented content personalization for deep user engagement.

The remainder of the paper is divided as follows. In the next session, we briefly discuss other related works and contextualize our approach. Next, we detail how our system works and our design decisions. In the next section, we discuss some opportunities and preliminary results that we gathered using our system. Finally, we conclude with some remarks and plans for continued exploration and evaluation of our work.

Download the paper below

Downloads