Despite having a large amount of media content, the user experience of pay TV services is often not on par with Over-The-Top (OTT) Video-on-Demand (VoD) offerings, which can cater to current audiences with short snippets and binge watch capabilities.

To address this, some TV providers offer short-form content by manually “cutting” the linear content into VoD assets. However, this is often neither feasible nor a scalable solution. Moreover, research shows consumers are struggling to discover new content leading to frustration.

Machine learning (ML) algorithms and Deep Learning (DL) in particular have gained tremendous popularity over the past years because they have performed comparable to, and in some cases superior to, human experts in object and speech recognition tasks.

In this paper, we present our platform which utilises state-of-the-art ML algorithms for the real-time analysis of thousands of hours of multilingual multimedia content, including television and VoD. These analyses enable us to obtain rich metadata from the content of videos and suggest small chunks of personalised content (”snacks”) to users based on their preferences.


Over the past decade, consumer viewing habits have changed drastically. TV operators and broadcasters, while they produce a lot of original content, are losing ground to OTT VoD service providers, that provide short snippets and binge-watching capabilities among others. Moreover, data shows that the younger generations spend half their time-consuming VoD content, an increase of more than 100% from 2010 until 2017.

To compete with the OTT VoD offerings, TV service providers attempt to manually cut linear content into smaller snippets and recommend these to viewers (like NPO Start1). This, however, is a costly and time-consuming procedure.

Interestingly enough, despite their differences both TV providers and OTT VoD service providers face the same problem: viewers find content discovery still very challenging. Recent reports show that viewers spend on average 1 hour per day searching for content and this number is expected to increase as more content becomes available.

Approximately 70% of viewers will prefer on-demand and catch-up services over linear TV content. Meanwhile, research shows that providing searching capabilities and recommendations to improve the viewer’s engagement.

These findings suggest that both content segmentation and recommendation must happen in an efficient and automated way to keep up with a large amount of content available and satisfy the customer need.

Deep Learning (DL), a subfield of machine learning (ML) in Artificial Intelligence (AI), has been successfully applied to solve cognitive tasks that were previously thought to be solvable only by human experts, thus gaining tremendous popularity over the past decade.

We have developed a multimedia content analysis platform which leverages ML algorithms to analyse thousands of hours of video and audio content in real-time. Some of the algorithms we utilise enable us to convert speech to text, recognise faces, identify objects, detect text and logos.

The result of these algorithms enables us to understand and search in video and audio content and create rich metadata. In this paper, we describe our developments in rich metadata extraction and investigate new media applications based on this detailed understanding of videos, such as in-content search, contentbased recommendations and snackable content. We also present three use cases for our platform.

Download the full tech paper below