AI has dominated the tech conversation in many verticals in 2023. It can generate volumes of copy in an instant, or digital artworks that might take an artist hundreds of hours to make. While use cases within broadcast are still emerging, the field is developing rapidly, writes Andrew Williams.
During IBC2023, speakers from Amazon Web Services (AWS) and Japanese national broadcaster NHK explained how they were developing, and implementing in NHK’s case, AI in-house.
The resulting expert insights offer a glimpse into how companies across the industry may end up using AI within the backbone of their workflow, not just on the periphery or as a research tool.
AI as Music-Engineer
Punyabrota Dasgupta, Principal Solutions Architect at Amazon’s AWS India, explained the development of one of the company’s music-generating AI tools, which was demonstrated by making “happy” and “sad” sequences of background music.
“In movies or television series, or even sometimes for the sensational news that people watch, background music goes a long way,” said Dasgupta.
“But the thing is, we are talking about a very large volume of content and creating a matching background music or for suiting the audience all over the globe might be a challenging task at times.”
AWS’s vision is to create a tool that will reduce the overheads in making custom content with music tailored for a diverse audience located all over the world.
“Machine learning can help us in actually synthesising fresh, original copyright free music, which we can be used according to the user tastes and preferences for the different content that we enjoy all over the world,” said Dasgupta.
It can even be used to create many forks based on the same raw content, with soundtracks tailored to specific markets or countries.
“There are multiple elements of personalisation. I might enjoy the same version of the series of a western content, or let’s say from Israel or from the Arabic world or any other maybe Korea. But, however, I would like background music I can relate to more, so how can I do that? So that is why we can very multiple models.”
But how is this music generated? The tech behind this particular demo is actually relatively conventional, using a long-short term memory-based AI, which is effective at recognising and replicating the patterns found in music. Its effect is similar to that of an informed autocomplete, gradually building up sections of music, but the effect would be near-instant to the actual user.
It starts with feeding the AI a section of music. “We need a set of sample files,” said Dasgupta. “There are two ways which we could have handled this. One is to use copyright free music available, and its notation, available on the internet. That would have been a starting point.”
“However, being a trained musician myself at least in the Indian classical music, I created the notations myself.”
The task of making sure that the music actually complements the content with which it is to be aligned also revolves around AI. While the software is designed to create sections of music to match human emotions, such as happiness, anger or fear, to make this kind of tool work at scale there also needs to be a way to actually extract this information from the source.
“You have a piece of media, which may be a movie or television series episode, a news segment, a promo material, and then we have to build a module based on extracted metadata,” said Dasgupta.
AI can be used to learn about the piece of content, though visual analysis of what’s on-screen. For example, a person’s facial features might be examined to see the likelihood that they are angry or sad. And that’s just one thread in the possible metadata attached to a video.
“So, first of all, we do text-to-speech converting the dialogue,” said Dasgupta.
“What else can we find out about that particular transcription? Maybe the mood, the sentiment, the accent, and whatever else helps us with this. Then there’s the visual analysis, in terms of the actor, the lighting settings, maybe the background scene. Everything adds to it.”
There are other routes too. In AWS’s own demo, a movie synopsis was generated using LLM (large language model) AI, the kind of content one might have access to in a real-world project. Then AI was also used to extrapolate the likely themes and emotions in that piece.
AI as Social Media Manager
AWS’s work in background music generation gives us an idea of how AI may one day work in the average production workflow, but Japanese national broadcaster NHK is already several steps ahead. It already uses AI to repurpose its broadcast content into new forms, maximising its reach without a dramatic increase in workload.
“We have entered an era in which an enormous amount of video content can be accessible not only via the television, but also the internet,” said Momoko Maezawa, Research Engineer at NHK.
The broadcaster uses AI to generate shorter form versions of its broadcasts, which can be posted on social networks. News reports are the primary content used here, but travel programmes are cut down using AI, and it’s even used to pick likely thumbnails for each piece.
“At the broadcasting station, summary videos and programme websites have been important in order to raise awareness of programmes,” said Maezawa. “However, the production of so many videos and programme websites requires high levels of specialisation and a lot of work, just like with editing for broadcast.”
AI is used to dramatically cut down on this workload, creating shorter form versions of content made in a more traditional manner of production. “A summary video can be automatically generation for about two thirds of the programme’s duration,” said Maezawa.
“First, newsroom video is automatically divided into shots. Then we take sample images from each shot and feed the features of those images into the AI. Shots containing [news] anchor images are judged to be intro videos, and sequences between are judged as main story videos.”
Journalistic TV news stories end up being the perfect testing ground for this sort of efficiency-boosting AI, thanks to the relatively formulaic make-up of the content. The AI is trained to identify different types of shot, like those in the newsroom versus footage at the scene of a story and treat it accordingly.
It can tell when the camera is “zooming in on key persons and showing topical objects in detail, and special shooting angles of buildings involved in an incident,” said Maezawa.
“Important video segments are extracted from the main story video using image analysis AI that has learned the type and size of the subject, composition and camera movement unique to important news scenes,” she said. It has also been programmed to recognise when key phrases from the new anchor’s intro monologue are repeated in on-site footage, a suggestion that this is likely to be a key part of the visual for a story.
“With this technology it possible to generate a summary video containing professional picture making,” said Maezawa.
At this stage, though, it is not produced entirely without a human touch. NHK have also developed a cloud interface where producers can see the make-up of these AI edited versions of news stories and travel programmes, to move segments around. Or, crucially, to remove sections the broadcaster doesn’t have the rights to post on social media.
NHK is a reminder that while AI can at times seem on the brink of revolutionising how we all work, some companies out there are already employing it in smart, practical ways.
The topic of how AI is advancing media production was discussed at a Technical Papers presentation during IBC2023 in Amsterdam. This talk was hosted by Nick Lodge, Director of Logical Media, and features NHK Research Engineer Momoko Maezawa and AWS India Principal Solutions Architect Punyabrota Dasgupta.