Video now dominates ICT networks and systems, representing over 64% of global IP traffic and over half of all storage within enterprises and data centers.

However, today video cannot be searched in the same way as alphanumeric data – this represents an unyielding ‘big data’ problem.

Current video search relies on resource-intensive human annotations placed in a database, as alphanumeric data.

This paper describes a new technology innovated by BAFTA (British Academy of Film and Television Arts) and UCL (University College London), which addresses this issue.

The technology extracts a compact video signature representing significant features of the video for search, which can then be used for a plethora of applications such as similarity detection, de-duplication of files, piracy detection, and semantic classification.

The video signatures are extremely rich yet highly compact, sized at approximately 5 megabytes per running hour of video.

This enables video to be searched at the speed of data, allowing video to become a first- class citizen of ICT networks and systems.


Video feature extraction is not new; there is a long-standing interest among technologists and media industry stakeholders from the 1990s onward in ‘automated content analysis’.

Early research projects such as “Informedia” (Carnegie Mellon University, 1994-1999)3 sought to provide meaning from video.

The primary focus of the MPEG-7 standard was to provide a “Multimedia Content Description Interface” – a mechanism for storing and communicating features of moving image essence, once known.

Although significant interest has been generated and substantial investments made, few products or services capable of commercial adoption have emerged.

The ability to perform automated discovery and search across large bodies of still images and video has become a significant research topic once again, driven by the abundance of media held within Internet-delivered services and dominating data center storage and consumer Internet traffic.

This paper discusses a research initiative currently underway with the aim of both: a) expanding the state of the art in video feature extraction innovation, and b) prioritising requirements unique to the professional media industry.

Video Clarity is underway as a collaboration between BAFTA Research, a business unit of the British Academy of Film and Television Arts (BAFTA), and The Media Institute of University College London (UCL)4, with generous support from Innovate UK, the UK’s Innovation agency.

Video Clarity aims to fully enfranchise video as a searchable data source, allowing video to be searched at the speed of data. Visual similarity detection is the focus of the research, which has many applications including but not limited to:

  • De-duplication of files held within data centers and file systems

  • Piracy detection, for similar or identical content

  • Matching media segments across disparate videos. For example: matching stock footage sources with edited titles, or matching raw camera rushes with editorial output

  • Vailidation of unique IDs, anti-tampering assurance

These applications are discussed in more detail, after a brief overview of the technology.


In order to provide meaningful search, we must first ask the question “What can a computer learn automatically from video?”. There are four generic sources of meaning, discussed briefly here.

Hint extraction

Modern video files contain hidden ‘hint’ metadata generated for access by networks concerning streaming.

This can be used as a source of low-level motion features, for example, the image on the right below shows significant movement detected only by decrypting ‘hints’ information, without reference to the source imagery.

Early during Video Clarity it was established that hint extraction was over 400x faster than video decoding, even before further optimisation or processing.