Speech as an interaction mechanism for television control is perceived as fast and easy. While speech and speech to text are standard mechanisms and commonly used, the usage of voice and voice-based information like emotional state recognition of the user is underexplored. To understand the potential of voice aware and mood-improving services a mixed-method approach including a web-based study with 130 participants from the US and a user experience study in Austria and France (n=20) was conducted.

For acceptance, results indicate that technology interest is a key value to predict acceptance of such systems, with acceptance rates of up to 80% for such systems. Main areas for innovation are services for people that feel stressed, users that would like to have a fun evening, or situations where people have difficulties in falling asleep.


Speech interaction is becoming increasingly available and popular in smart living rooms due to the rise of digital voice assistants such as Alexa or Google Assistant. A speech request carries verbal information, the command and based on the users’ individual voice, the tone and frequency allow us to gather different emotional states. There are different fields where emotion-recognition can be applied, see already existing applications for example in call centres for the training of customer support or emotion-aware infotainment systems in cars.

Today, smart speakers make use of what the user is saying, but not (yet) how the user is talking to the machine (the users voice print). This ability could be implemented soon in any standard smart speaker, thanks to the availability of data, efficient machine learning techniques and the recent progress in emotion recognition technology.

The main goal of this research was to investigate user acceptance of emotionally aware systems and services that enhance mood. Particularly in terms of a reported gulf between users’ expectations and experiences regarding those “artificial roommates”, insights into users’ expectations as well as users’ experiences with existing products are necessary contributions for the development of products. As part of such contributions, the environment (or context), personality (neuroticism) and gender have been highlighted for evaluations of user experience regarding emotion-aware systems. Additionally, the general dimensions of user experience for interactive television are important, particularly the dimension emotion might benefit from three factors that reflect rewarding feelings concerning user’s entertainment experience: 1) fun, 2) thrill, and 3) empathic sadness. All of those factors were taken into account for the current report and study design.

Central questions for this report were what to do with the recognised emotion not only in the context of watching TV but beyond taking into account (future) smart environments in the home? What will be the potential benefits for the user if the virtual assistant knows the emotional context of a request? Do users value emotionally aware services in a different way when they are in a different mood, e.g. sad or happy?

In the short term, the virtual assistant may provide suggestions for additional actions to realise (e.g. movie or music recommendations). In the longer term, the virtual assistant could be able to assess the user’s state of mind at a given time, her mood in the long run and even change available functionalities in the smart home to make for example the user’s life more convenient or more relaxing (e.g. mood-related adaption of lights and music playlists).

The remainder of this article presents the research goal and the selected methods, with descriptions of the study set-up and prototypical system used, results and a discussion.

Download the full paper below