To grow their business and increase their audience, content distributors must understand the viewing habits and interests of content consumers.

This typically requires solving tough computational problems, such as rapidly processing vast amounts of raw data from websites, social media, devices, catalogs, and back-channel sources.

Fortunately, today’s content distributors can take advantage of the scalability, cost effectiveness, and pay-as-you go model of the cloud to address these challenges.

In this paper, we show content distributors how to use cloud technologies to build predictive analytic solutions.

We examine architectural patterns for optimizing media delivery, and we discuss how to assess the overall consumer experience based on representative data sources.

Finally, we present concrete implementations of cloud-based machine learning services and show how to use the services to profile audience demand, to cue content recommendations, and to prioritize the delivery of related media.


An abundance of technical advancements has expanded the range of options for media consumers.

Today’s consumers can choose to have 3-D, 4K, HDR, and even 8K content displayed on a variety of sophisticated devices.

Given the public’s appetite for these high-end devices, media creators are constantly under pressure to increase resolution and quality to compete in an ever-expanding war of content choices.

In addition to the changes in display technologies, on-demand content and streaming media delivery have changed the habits of content viewers.

Gone are the days when we used to circle around the TV set at an appointed time for the airing of our favorite TV show.

People now expect to watch the programming they want on their own schedule, which is driving more and more media companies to consider providing their own OTT (over the top) service.

These services use the Internet for delivery, which introduces potential quality issues that are beyond the control of the media owner or distributer.

To mitigate these risks, many media distributors invest heavily in solutions that detect playback issues; these solutions require large amounts of computational capacity to process massive, raw datasets to provide real-time course correction.

In this way, distributors can provide more reliable content that caters to the viewing habits of their audience.

This raises the question of how to build the next generation media delivery platform that not only delivers reliable content, but also ensures that the content is experienced the way the content creator intended.

One approach is to have the delivery platform predict when events such as network congestion or low-quality streams will occur in the future, and subsequently guide consumers in the right direction.

Powerful tools toward this goal include using data generated by consumers from their interaction with content, social media, and multiple screens in conjunction with predictive modelling, machine learning, and real-time analytics.

According to Nielsen, social media activity drives higher broadcast TV ratings for 48% of shows (1); in a similar survey by Netflix, over 75% of what people watch is based on Netflix’s recommendations (2).

In terms of audience engagement, there are two basic categories:

  • Content experience – Using predictions and analytics on viewers’ viewing habits, player network logs, and datasets to quickly analyze existing issues or predict future issues. The predictions can be used to minimize or even eliminate a poor customer experience.
  • Content relevance – Using predictions and analytics on historical and some real-time datasets to detect and recommend relevant content and personalize content and ads, thereby improving the experience for content selection.

Audience Engagement Signals

To build a next generation media delivery platform and deliver a better customer experience, content distributors can capture, fuse, and synthesize background signals (noise) to create models to analyze, for both batch and real-time data.

We can leverage both transactional data (from user interactions such as searching, playing, watching/listening, and contacting sales/support) to behavioral activities (such as sharing, tagging, liking, reviewing the content, and so on) to build a prediction model.

The analysis can be descriptive (aggregation, retrospective), predictive (statistical, machine learning) or prescriptive (what should we do about it?) based on technology choices.

In the remainder of this paper, we focus on descriptive and predictive analytics, especially in the context of content relevance.

Machine Learning for Predictive Analytics

Machine learning (ML) is a broad area of tools and techniques that can help us use historical data to make better business decisions. ML algorithms help us discover patterns in data and construct predictive models using these patterns, allowing us to use the models to make predictions from future data.

For example, we could use ML to predict whether customers will select a title to view based on data such as their viewing history, what other users in their same demographic have watched, and even who they follow on social media platforms.

We then can use the predictions to identify which customers are most likely to respond to personalized, promotional marketing campaigns.