Abstract

For any individual user, the amount of available content is exploding. Recommendations have become an integral part of digital business, helping people to find content and services, while at the same time enabling carefully targeted advertising.

A major challenge in recommendation systems is that they are either domain specific or need a substantial amount of data. This favours global data-driven platforms, available from few organisations, notalbly the GAFA group (Google, Amazon, Facebook and Apple).

The technology presented in this paper enables a recommendation engine network, in which all parties own and are able to administer their own data, challenging centralized models of today. The approach is based on exchangeable anonymous tokens. This enables a de-centralized recommendation architecture in which different recommendation engines can be located at the edges of networks and linked together while respecting the ownership of data.

This paper introduces architectural models for the technology and a conceptual view of an ecosystem-based on them.

Introduction

In general, recommendations are used to estimate a user’s response to new items based on historical information stored in the system, and suggesting novel and original items for which the predicted response for that user is high, as defined by Desrosiers and Karypis. These items can be pieces of content, services or goods. Therefore, advertising is a self-evident application area for recommendations.

Recommenders are commonly classified into two basic categories: content-based and collaborative. Content-based recommenders are based on representing the items with a set of attributes, and using these attributes to find the most relevant content for a particular user. As an example, Agatha Christie is known to write detective stories. If a user has been reading her novels, other detective novels are recommended for him.

Collaborative recommendations, on the other hand, learn from the behaviour of users as a whole, without any need to define properties of individual items. For instance, if users A and B have had similar behaviour in the past, and A has found item X preferable, this item is likely to be recommended for B as well. Being solely behavioural, collaborative recommendations can easily span different domains, unlike content-based recommendations that are limited to each domain with mandatory domain-specific knowledge (such as genres of individual novelists in the example).

However, when it comes to privacy, there are challenges in traditional collaborative recommenders: Since the recommendations are based on historical behaviour of large user groups, their history has to be recorded. Several studies in the past have been addressing this problem, such as Canny who introduced ‘talliers’, which compute public aggregates on behalf of communities of users. This approach requires individuals to trust these talliers, who are acting as intermediaries. In another approach, Yakut and Polat addressed a case in which multiple vendors (typically companies) share at least partially the same user pool, and the vendors are responsible for sharing no personal information about their customers. In this approach, users and items are arbitrarily interleaved into different partitions, and no vendor learns anything about the individual behaviour and items held by another vendor. While the method can be considered privacy-protecting, also from vendor perspective, the implementation is centralized and all parties must trust whoever operates it.

A different approach has been taken by Ollikainen et al, introducing a de-centralized collaborative recommendation technology that is primarily designed to protect end-users’ privacy. Unlike in any other collaborative recommendation method, in this approach user data gets aggregated as a collection of random values, ‘tokens’, which under certain conditions can be exchanged without exposing users’ identities or their preferences.

While the method makes fundamentally no difference, whether user or item tokens are processed, it protects item-related and user-related data equally well. This enables sharing business-related data, making co-operation between competitors possible. The technology has been in public use since 2014 in Helsinki Metropolitan area libraries. Available online, it has currently 600,000 patrons in its databases and it actively covers 300,000 book titles. This service is running on a single virtual server, implementing a centralized model, while the method itself is topology agnostic: it can equally enable distributed, even edge-computed architectures; models discussed later in this paper.

This paper is organized as follows: The following chapter presents the principle of the method and the basis for privacy, followed by a chapter presenting different architecture models. These models introduce how token collections and recommendation engines (‘recommenders’) may be arranged. The paper is summarized and ecosystems are discussed in the last chapter.

Download the full paper below