ABSTRACT

Over the years a number of software products for content management, archiving, program planning, and quality assurance have been developed.

In addition to metadata handling taking place in such systems, content creators also struggle with physical storage of the content.

While large enterprise solutions seem to be well developed and tested, there is still room for human error, be it in integration layer, workflow writing or error handling.

Achieving and verifying system-wide metadata and content integrity in loosely integrated environments where we can not track every change in every system may be hard.

There may also be some integrity requirements that are difficult to validate without having access to all the data in the organization.

This paper presents an approach for overcoming such a problem by moving validation of integrity to a layer above the rest of the ecosystem.

The presented approach is general in nature, independent and system agnostic.

INTRODUCTION

In a broadcast organization content is created generally with an intent to publish and/or archive it in one way or another.

Content creation usually involves several steps from planning to publishing and archiving where each step may involve separate software systems and store data in multiple locations.

There is no one main source for metadata. On the contrary, each subsystem usually contributes only part of the metadata.

At some point in time a copy of essence (physical representation of the content like video file) is added to the mix.

All those systems may or may not communicate with one another.

We need to ensure that media objects with their metadata and essence are properly archived, aired, published in web, and is available for repurposing.

Once an essence or metadata is created we need to be sure it is not tampered or corrupted throughout its life cycle and is propagated to every subsystem according to our business rules which dictate how, when, and who can modify it.

The work done in this paper is based on Estonian Public Broadcasting (ERR) real life situation. Therefore, we first briefly describe the ERR media management ecosystem.

We present a working prototype which has been in use for actual validation with real results in production environment.

It is simple to ensure data integrity within a single software system whereas the situation becomes more complicated when we need to ensure the integrity of media objects across loosely coupled subsystems with diverse data structures, communication protocols and storage units.

The remainder of the paper covers the following topics. At first a brief overview of related work is given. Second, we describe ERR ecosystem. Then we will list the requirements for the integrity validator (IV) followed by an overview of the proposed IV’s architecture including the detailed description of the implemented requirements.

In the results and discussion section we describe the preliminary results, discuss the options for implementing the prototype into the production system and describe other possible approaches for our task. Finally, a short conclusion and possible future work.

RELATED WORK

There are a number of system monitoring software products available like Icinga (1) or Zabbix (2) just to name a few.

Those existing products are meant to excel at monitoring health and performance of devices, servers and other resources.

However, they are not suitable for monitoring the integrity of metadata, essence, and business processes without extensive work on creating add-on modules which would satisfy all our requirements.

As one of the goals of our approach was to be system agnostic we had no desire to tie our system to one specific product and as a result, our solution is built as an independent system.

Ensuring data integrity in databases and storage systems is an extensively studied subject but the problem addressed in this paper carries more resemblance to obstacles found in distributed data storage systems.

There is research done on rule based consistency in data grids. For instance, Rajasekar et al (3).

In our environment, we are not only concerned about the integrity of a single metadata value that propagates to multiple subsystems or a single version of an essence that should be stored and preserved on multiple locations.

Our view of integrity also includes transformations, derivations and dependencies of the essences and metadata.

Financial auditing world has adopted continuous auditing concept over the last decades as IT systems have evolved to permit near real-time access to single transactions.

Continuous auditing has steered auditing process towards being more proactive at handling of errors and works more as a deterrence and avoidance than correction of mistakes as discussed in Rezaee et al (4).

Similar concepts apply to broadcast production where our objective is discovering integrity violations before to the point where damage would be irreversible.

DOWNLOAD THE FULL TECH PAPER BELOW