ABSTRACT

With the sheer scale of digital information now available, many journalists have recently started using data in order to tell compelling stories. Data journalism is becoming in fact a fundamental practice to increase information trustworthiness, obtaining full digital products exploitable on several platforms and improving user experience.

Nevertheless, due to the complexity of the data ecosystem, this practice represents a challenge for any company involved in news production. Editorial staffs, in charge of extracting sense out of data to create newsworthy stories, need to be properly supported by targeted methodologies and efficient technological solutions.

At this purpose, a prototypal workflow model is here presented. By retaining absolutely strategic the integration between proprietary and state-of-the-art tools, an experimental implementation of a toolbox and of an integrated platform is also described.

INTRODUCTION

In the modern Big Data age, information delivered by media is exponentially increasing and becoming impressively overwhelming. Leveraging social networks, search engines optimisation more than traditional networks, news bounce all around the world in a very few seconds.

As a consequence, it is now commonplace to think that information is normally available and accessible in an easy and quick way to everyone. Nevertheless, relevant information (e.g. statistical data, relationships between persons playing a role in a story, etc.) is often delivered by newstellers in an implicit form.

Most of the times, this additional information is delivered by a short text item and/or a video content (e.g., a chart) without any specific and stable structure or format, thus namely far from being machinereadable, reusable and exploitable in the long term. This means that, once the news has been delivered, such an information can be hardly retrieved and/or extracted for further investigations or insights. 

At this purpose, in the last few years, the technologies for professional Big Data analytics have been representing a strategic necessity to make information resources available to professional journalists and media producers in a more effective and efficient way.

The challenge lies in the ability of collecting, connecting, analysing and presenting heterogeneous content streams accessible through different sources, such as digital TV, the Internet, news agencies, social networks and media archives, and published through different media modalities, such as audio, speech, text and video, in an organic and semantics-driven way.

Download the full technical paper below