Authors: Kiyoshi Kurihara, Atsushi Imai, Hideki Sumiyoshi, Yuko Yamanouchi, Nobumasa Seiyama, Toshihiro Shimizu, Shoei Sato, Ichiro Yamada,  Tadashi Kumano, Reiko Tako, Taro Miyazaki,  Manon Ichiki,  Tohru Takagi, Susumu Oshima and Koji Nishida


NHK has developed a means of automatically generating auxiliary audio descriptions from metadata for use in live TV sports programs.

Audio description services are important for helping visually impaired persons enjoy TV programs, but such services are currently available for only a handful of programs because many studio resources and personnel are required to create audio descriptions, and it is especially difficult to produce such descriptions during live broadcasts.

The method described in this paper has the potential to overcome these obstacles. The system that we constructed for the Rio Olympic and Paralympic Games consists of commentary text generation and text-to-speech (TTS) processes.

The commentary text generation process generates commentary appropriate to the situation for each piece of event data accepted by the system, and the TTS part converts it into natural speech. We ran the system during the Rio Olympic and Paralympic Games, and it provided both caption and audio descriptions for over 2,000 sporting contests.


We present an automatic system for live generation of audio descriptions for TV sports programs. Our research on automatic generation of audio descriptions has two main goals. One is to provide efficient and effective automatic program commentaries that are useful for sighted persons. The other is to provide audio descriptions that can help visually impaired persons get more out of TV.

Although they have been recognised as a helpful program service, audio descriptions are currently provided for only 10% of the programs broadcast in Japan. In most cases, audio descriptions can only be attached to content during post-production in TV studios and control rooms, and the process entails the effort of sports announcers, directors, and technical staff.

The expense, in terms of studio and personnel costs, and the difficulty of adding live audio descriptions to programming, has so far constrained the growth of audio description broadcasting. Automatic generation of audio descriptions, on the other hand, has the potential to solve these problems and improve the penetration of the service in broadcasting.

In our development, we first constructed a prototype system for automatically generating audio descriptions from event data gathered at the Rio de Janeiro Olympic and Paralympic Games in 2016. Our aim was to produce test programs by automatically attaching commentary to the video footage, without manually adding any commentary, and examine the issues which arose.

Download the full technical paper below