AI in theory, practice, and media workflows

No comments

Although the reality of artificial intelligence (AI) as a free-thinking intelligence not unlike human intelligence is still the stuff of science fiction, it holds great promise for broadcast and media, writes Thomas Bause Mason, SMPTE Director of Standards Development.

There is no HAL 9000 of 2001: A Space Odyssey in our near future. Yes, it’s true that common applications of AI today reflect some of the character HAL’s capabilities — speech recognition, facial recognition, natural language processing, and playing chess. Nevertheless, scientists today are nowhere near creating the Artificial General Intelligence (AGI), such as HAL, capable of handling a wide variety of tasks and of specialising in any of those areas to improve performance.

Thomas Bause Mason

While the development of AGI is many, many years off, AI is making a tremendous impact on everyday life for people around the world. In a recent SMPTE Motion Imaging Journal article, Richard Welsh discussed the nature of AI today, addressing concerns about the technology as well as hopes for its potential, particularly for the media and entertainment industry.

“The notion of AI, in general,” Welsh writes, “has raised much concern in some sectors of the scientific and technological community as well as political concerns about the impact on employment and social cohesion.”

Ultimately, however, after discussing AI from several perspectives, Welsh concludes that commercial pressures — namely time and cost — and an increasing number of low-risk applications for AI will together cause some or all of specific processes within the media industry to be handed over to AI.

Welsh sees great potential for AI in the media industry. Indeed, machine learning (ML) and deep learning (DL), two subsets of AI, already do bring valuable benefits to industry professionals, as well as to consumers.

Understanding ML and DL
Both ML and DL are, as Welsh puts it, “synthetic or mathematically defined processes that mimic the biological decision-making process.” ML is the lightweight of the two, using a network of decision-making nodes that are trained to complete a particular task. DL is the heavyweight, taking advantage of layered networks on a much larger scale and using huge live data sets along with statistical analysis to improve accuracy.

ML may be put into one of two basic categories: classification-oriented and regression-oriented. Although classification networks often are associated with DL because they perform optimally with access to massive data sets and a continual flow of new information, ML also operates in this category.

With data on what characteristics differentiate elephants, horses, dogs, and mice, the ML algorithm can identify boundaries between these classifications and use those boundaries to identify different animals. ML can adjust when exposed to new data, but it has only its existing classes and data sets with which to make identifications. For this reason, the quality of the data used to train the network has a strong influence on the algorithm’s accuracy and utility.

Unlike classification networks, for which outputs are limited to a particular set of values, regression networks deliver continuous outputs that may have any value within a specific range. Regression networks can tackle more complex tasks when used together in large groups, with single networks addressing separate elements of a larger problem.

Arranged in a series of nodes, these networks individually make binary decisions that dictate where the resulting value should next be processed. Nodes form decision trees that are grouped into a “forest” or “jungle,” depending on the level of complexity and use of weighted feedback to improve the accuracy and utility of the final value or decision. The benefit of using ML in this model is that very little computational power is required at the nodal level. As a result, ML can reasonably be deployed in local software or device hardware.

A familiar implementation of ML would be the simple face-recognition algorithm run locally on a smartphone to determine the ideal point of focus and optimal moment — everyone with eyes open, looking at the camera — for the camera to trigger to capture the image. In this instance, the ML is trained up and deployed at the necessary state of usefulness, and it remains in that state until it is upgraded. (Training up an ML generally involves providing inputs, tweaking behavior to yield different outputs, selecting the output closest to the correct answer, and then iterating.)

Because this type of local ML cannot necessarily determine who or what is in an image, DL is needed for applications requiring more than generic face recognition. With a DL network and a suitable database of faces and related information — such as name and other personal information gained through a link to social media — it becomes possible to identify people and objects with a high level of confidence. Common applications for DL include search engines, tailored online advertising, or image-recognition systems.

In practice, ML and DL often are used together. A local implementation of ML typically takes care of simpler issues and hands off more complex problems for cloud-based DL, ideally returning results quickly and without compromising the user experience. In the SMPTE Journal article, Welsh uses Amazon’s Alexa smart speaker as an example. He points to the recognition of a “wake” word as an instance of local ML, with the rest of the command being handed off to cloud-based DL for natural language processing and interpretation of the command. Welsh also notes that the impressive degree to which Amazon has optimised this system has led many users to view their Alexa systems as having personality.

Read more The dangers of bias in machine learning

AI, the Human Optimisation Space, and Human Impact
Despite the remarkable role that AI plays in many familiar devices today, Welsh argues that AI is a long way from entering human’s optimisation space. In the human realm, AI cannot account for the full ecosystem, nor can AI consider the broader impact of its actions. While it is possible to build very powerful AIs that attack very narrow problems, it is another matter entirely to develop AIs capable of systematically tackling arbitrary issues. A prerequisite to that would be the creation of an AI capable of finding all optimisations. Perhaps, eventually, such a development could evolve into AGI.

While AGI in science fiction often is portrayed as dangerous, the vastly less capable AI in modern life today raises concerns over safety, as well. Concerning the perceived dangers of AI, Welsh writes, “We are already surrounded by everyday objects and tools that are quite capable of harming or killing people by accident or by design and do so with great frequency at the hands of humans.” Using driverless cars as an example, he points out that even though properly trained and capable machines should dramatically reduce road deaths and injuries, accidents still will happen. He notes that the question of assigning “responsibility,” and perhaps liability, for such accidents becomes particularly arduous when an algorithm is at the heart of decision-making that directly affects human life. Though adverse outcomes are fewer in number, they nevertheless can be perceived as outweighing the many positive benefits of using AI.

Fortunately for those of us working in the media industry, the typical application of AI calls for a lower level of trust and fewer questions about more significant ethical issues. Human intervention or oversight is a must. While AI can perform the bulk of tedious, time-consuming tasks such as censorship editing and automatic rotoscoping of pre-release content, humans will maintain “final control” over the end product. Through this model, concludes Welsh, AI can and will do great things for the media industry.

Thomas Bause Mason is director of standards development at SMPTE.