Voice is becoming an increasingly important part of content discovery, but can it ever replace the remote control? Ann-Marie Corvin investigates.

voice assistant ai finger

Voice assistant: Demand is increasing

Voice is becoming increasingly important in the device and entertainment ecosystem, with manufacturers and service providers recognizing its potential in content discovery and engagement.

IHS Markit’s latest consumer research survey indicates that demand is there, finding that one in five consumers across the US, UK, Australia and India markets have used voice commands to operate their TV or video devices.

And some broadcasters have started to roll out basic voice functions: Sky’s premium SkyQ service now offers voice control while an early adopter of Amazon’s Video Skill API has been the UK’s hybrid VoD platform YouView.

According to YouView’s director of product, Sion Wynn Jones, while most of the new voice functions – due for roll out shortly following a two-year trial - are still fairly basic, the advantage over using a remote is that it gets there much quicker.

“Search for Line of Duty on Amazon Alexa’s Video Skill kit and it will pop up straight away – it removes friction and time for users and widens accessibility for certain viewers,” he says.

The aim, he adds, is to eventually introduce some basic personalisation into the service by applying voice identity to focus on individual household members’ requests.

“This will allow us to identify different people in a room and suggest shows everyone is happy with,” he says.

However, this contextual conversation and personalisation within the TV domain is still limited, according to Channel 4’s newly appointed chief product officer, Dave Cameron.

“Voice control is great at specific requests with a specific answer but less advanced at more open-ended conversations” Dave Cameron, Channel 4

“Voice control is great at specific requests with a specific answer but less advanced at more open-ended conversations such as ‘What would be great for me to watch tonight?’”

He adds that TV discovery is often used within a family unit (and makes the point that children’s voices, for example, can be hardest to deal with in terms of accuracy).

Nielsen subsidiary Gracenote has recognized the limitations of voice UI and is building out a new TV data product with around 2,400 keywords that describe movies and TV shows by mood, theme, and scenario.

Using descriptive tags like “greed” and “betrayal,” as well as “dark” and “gripping” for a show like “Game of Thrones,” Gracenote hopes to make both personalized content recommendations and voice search a lot more effective.

Voice-first approach
However, Cameron argues that a new ‘voice first’ approach is required to maximise voice’s navigational potential.

For too long now, he argues, the industry has been treating voice as something that needs to fit within the directional navigation remote control UI.

“This means that often you cannot complete the journeys (or get out of a journey), without the assistance of the navigational remote,” he says.

Nicky Birch

Nicky Birch

However, as Nicky Birch, an executive producer at BBC R&D points out, removing screens and other devices from the voice UI currently limits discoverability.

“How do we tell people content exists when we don’t have a screen, without relying on other screens? This discoverability element of voice is very complicated, and this hasn’t been fully resolved yet,” she says

Birch suggests that this might be why, after some initial enthusiasm for voice, brands have become more nervous about investing in the space.

Offering a voice-driven experience is different from having one that people can find, but brands do not want to cede the power of “being found” to the platforms if it’s not going to work in their favour.

She thinks that it’s possible some form of search advertising for voice will emerge, or that broadcasters will be able to partner with voice services operators such as Amazon, Google in other ways.

Cameron adds, however, that broadcasters should think carefully about what they want to achieve on these platforms first.

“The trade you may be making is ceding some of your brand equity and allowing a new gatekeeper to control all of the important consumer interface to the home,” he warns.

“This is a particular issue if the gatekeeper seeks to monetise its gatekeeping position or favour its own content.”

Bjarne Andre Myklebust

Bjarne Andre Myklebust

NRK’s head of distribution Bjarne Andre Myklebust thinks that a data trade on both sides will be necessary to improve voice services for viewers.

“Will we be able to access the data to better our product? The answer, I think, will be not always. We need to resolve this with third parties and big players because they also need access to our data too, to improve their offerings,” he says.

Lost in translation
Like other non-English/French/German/Spanish speaking territories, Norwegian broadcaster NRK is also dealing with nuances of language. 

According to the public broadcaster’s head of voice and product development Marit Rossnes, Alexa is not yet available in Norwegian so the PSB is using Google Home’s developer platform, which, she claims, is not as mature and can be difficult to work with.

“There have been instances where commands have been misunderstood, which can be frustrating for users, so it does impact on the usefulness of the service,” she says,

Territories such as Easter Europe and the Balkans are experiencing similar issues, she reports, and adds that NRK is currently in discussions with the main voice platforms as well as universities in Norway, that could provide an independent solution to this issue.

Myklebust remains broadly confident that once speech to text and text to speech becomes more mature on these speakers, and once ML is more developed, this convergence will “change the face what audio and media is.”

So, is voice ever likely to replace the remote control in the user’s content discoverability journey?

C4’s Cameron says that for this to happen more products need to be designed with a ‘voice first ‘approach.

Myklebust thinks it may replace some remote control functions, but it will depend on the demographics of the user: “Some people might still feel awkward talking to a machine,” he says.

“It’s a really interesting time – the hype is levelling off… but there’s lots of potential coming up” Nicky Birch, BBC R&D

To this end, the BBC R&D is about to undertake a major piece of research into the type of voice audiences respond better to.

“We’re looking at how do people feel about talking to synthesized voice vs a pre-recorded human one; Is the user’s reaction different depending on gender? We’re hoping to explore this side in more detail,” reveals Birch.

Birch also echoes her fellow broadcasters in her belief that in terms of voice control, the best is yet to come.

“It’s a really interesting time. The hype is leveling off as well as the reality of what you can do, but there’s lots of potential coming up. There’s some interesting work happening and what you’ve seen or heard so far isn’t it.”

The role of voice in content interactivity
Once the voice UI is more established NRK’s Myklebust hopes that more interactive applications will follow, in which voice control plays a central role in the content.

“We’re looking at how best we can use the platform as a public service broadcaster,” he says. “We’ve experimented with a quiz format but found it challenging to get the speech interaction to work both ways, but the potential [to interact with a TV show] is there,” he says.

BBC R&D has also been experimenting with voice content which has resulted in two skills so far being released on Alexa: The Inspection Chamber - a choose-your-own-ending sci fi experience and a more experimental art house non-linear radio 3 play, The Unfortunates.

R&D’s latest prototype, Birch reveals. Is designed to explore how the BBC can utilize lengthier two-way conversations between the voice speaker and the viewer, using live semantic analysis.

The prototype of Five Days Five Dates is aimed at a younger demographic more interested in game-like experiences and sees the user become the friend of a young woman and has been set up on five dates with five different people.

The user is there to advise the friend - via voice control - on what course of action to take on each date.

“The character might ask ‘what do you think I should do?’ and the user might reply ‘dump him’ and the reply would be ‘dump him, yeah ok, he is a bit weird’. None of this has been productionized yet, but we are confident that we can deliver on the experience,” she says.