Smart speakers and their AI controllers are revolutionising how we consume music and spoken word, but are broadcasters and platforms fully exploiting their capabilities for other content, such as video? And how can we trust what the AI offers us?

Fire TV Cube Living Room

Envisioning the smart living room: Fire TV Cube 

Smart speakers have been around for a few years now, but their use has become more pervasive in recent times. Indeed a survey by Value Market Research found that the global smart speaker market is booming, with revenue at $4.5 billion in 2017 anticipated to reach $30 billion by 2024, while the latest Smart Audio Report puts ownership of smart speakers at 24% of the US population (the largest consumer of such devices), with 69% of smart speaker owners using their device daily.

Of course, the hardware isn’t the main event here, smart speakers are just another interface between us and the cloud-dwelling AI, or intelligent virtual assistants (IVA), that are now ubiquitous in our phones and computers, with more appearing in cars, doorbells and even robot mops and a showerhead.

Listening in
Amazon’s Alexa, Apple’s Siri, Google’s Assistant, Microsoft’s Cortana, not to mention new players such as Yandex’s Alice, Samsung’s Bixby, AliBaba’s AliGenie, and several others, can all offer interactive access to content through a variety of cloud-based applications and services.

LG Smart TV voice search

LG Smart TV voice search

Though gesture controls are now being explored by some developers, the main way to invoke the IVA is voice. Smart speakers from the likes of Sonos, Apple’s Homepod or Amazon’s Echo or Google’s Home ranges are alerted by an activation word. Utter a command like ‘show me Coronation Street’ or ‘open Netflix’ after the invocation phrase, and the speaker will jump to attention, activate a smart TV and go to that programme or app. Of course, the IVA might already be on the set-top box or TV itself, in which case the AI can be invoked by pressing a button and speaking into a remote control.

Amazon’s leading streaming media player Fire TV, with more than 40 million active users globally, depends on Alexa for its voice recognition for both Amazon Prime Video and third-party integration; the company reports that it has over 50 partners with expanded in-app voice controls worldwide, which cover more than 80% of total viewing hours. On the Fire TV Cube, which features microphones for hands-free Alexa control, voice interactions are six times higher than any other Fire TV device, and Amazon reports more than 600 million Alexa utterances since its launch in 2018.

FireTV  Cube Left

Amazon: FireTV Cube

According to Amazon, those of its partners that have integrated in-app Alexa voice controls are seeing increases in engagement when they integrate these deeper voice controls into their app. An example given of this is Hulu: since launching in-app Alexa voice controls, Hulu has found that customers who are using voice control are more than doubling their hours of usage, which Amazon says means “more time watching the content you want to instead of looking around for it”.

Personal service
Machine learning can log individual viewing patterns and improve performance, while IVAs like Alexa and Google Assistant are able to distinguish between voices. Create a voice profile with Alexa for example, and an acoustic model of your voice characteristics are stored in the Cloud and the IVA can respond to you by name. Natural language algorithms employed by the AI to recognise what we are saying in our speech are becoming ever more sophisticated, as more people use them. This all enables devices to deliver personalised results and responses, but how far away are we from truly intelligent assistants? What about something more intuitive, and how can voice control make content more discoverable?

Apple’s Siri AI is available across its products: on Apple TV 4K and Apple TV HD, it’s activated on the remote. As well as being used to control playback, you can ask Siri to find a movie or TV show, refine your search by actor, time period, director, then refine it further with commands such as ‘Only the good ones’, or ‘only comedies’, or ‘just the ones from this year’.

The BBC, in conjunction with Microsoft, explored a similar approach with its internal Voiceprint experiment a few years ago, allowing an individual to sign-in to iPlayer using their voice as an acoustic signature. Saying “BBC…show me something funny” brought up a selection of BBC comedy programmes, while “BBC…what’s going on in the world?” switched iPlayer to the BBC News channel.

More recently the BBC has been moving towards making this kind of AI available as a public service, at least on an audio basis, creating Alexa skills for BBC News output and for its CBeebies users, as well as developing a new BBC voice assistant.

Mukul Devichand BBC Voice

BBC Voice executive editor Mukul Devichand 

“As a principle we want to enable a conversation with the BBC,” says Mukul Devichand, executive editor for BBC Voice + AI. “You can see that in everything that we’re designing.

“The BBC Kids skill has been really interesting because that’s a key demographic for us,” he continues. “Children can say open CBeebies, and they can then ask for the stories and songs and play along with their favourite characters. We’re learning the language of how to speak to them, and how they want to speak to us.”

Another example is the BBC News skill. “At the moment, that’s an audio-based service, but there’s no reason in the future if people start using voice for TV, that this sort of thing can’t have a visual element,” says Devichand. “Fundamentally, we wanted to make sure that when people ask for BBC news on the BBC News skill that they’re getting the ability to interact if they want it, but also that we are able to editorially shape that experience, even though it’s early days for this technology and the sort of things you can say are still quite basic.”

The skill offers some basic interactivity; you can, for example, say ‘next’, to move through the stories, or say ‘show more from the BBC’ on many of the stories.

Devichand says the BBC also wanted to be sure that it was determining the news agenda. “So that if you keep on saying ‘Next’ you don’t suddenly get to a piece of content that was created by any other actor, like a publisher or a foreign website or political party or anything,” he explains.

“If you ask for BBC News, you can trust that we will keep taking you through the news that’s editorially filtered for you, if you ask for ‘more’ on the story the answer will be impartial and up to our standards. As well as trying to understand what you might want, you’re going to get a BBC authoritative take on that.”

Creating its own voice assistant technology – working title Beeb – is very much part of the BBC strategy. “We’re not creating a smart speaker, we’re certainly not wanting to be seen as a rival or an alternative to something like Alexa,” he stresses. “Beeb might be on other smart devices that already exist on TV, on phones, in smart speakers wherever. We’re hoping to distribute it as widely as possible.”

Come talk to me
As discussed, the IVAs get smarter with their use of natural language as more people speak to them, and Devichand also thinks the experiences that people design on them are getting more ‘conversational’, but he admits there’s a way to go until you can speak to any assistant and it’s like speaking to a person.

“At the moment [with Beeb] you can ask for any BBC radio station or podcast, or anything from the BBC Sounds catalogue, and we’re able to give it to you,” says Devichand. “But what we haven’t really got to yet is how we can be really conversational about that. That is the difference between saying ‘Play me Fleabag’, and seeing Phoebe Waller-Bridge on a TV programme and then saying to your device, ‘can you play me the series with her in it?’

How are we going to get from one to the other, if that’s what we think people might start doing? That raises all sorts of questions about technology, but also, who are you asking to do it, which is also an editorial question.

“What we want to do is make sure that where you can speak to us, that we can provide a full service as possible,” he continues. “One that you can really trust on all levels: trust around your data, the privacy aspects, but also the editorial that you’re going to get from it.

We want to make sure that there are BBC public service values in the answer that we’re able to give you, to make sure that the content that we are able to direct you has a quality you’d expect from us.”

More widely, Devichand sees smarter voice interaction offering a powerful potential change in the internet, and more.

“I’m a programme maker by background,” he says. “Every time the technology platform changes, it has a massive impact on how we tell our stories. For the first time, this technology, even if it’s quite rudimentary right now, means that [the audience] can actually speak back. We need to construct programmes or experiences that are responsive to what you want when you speak back, we need to make sure we actually hear you and what you say, and that we create experiences that respond to that well. [conversational AI] will have a big change in our creative process, I want to make sure that we and the British creative industries are at the vanguard of that.”