Technical paper: This paper introduces a celebrity face recognition AI for video metadata generation.


In this paper, we introduce a celebrity face recognition AI for video metadata generation. Face recognition performance has shown significant improvement thanks to deep learning. We implemented a face recognition AI using our customized dataset composed of mostly Korean celebrity faces designed for the content analysis of KBS.

Bothersome dataset labelling process was enhanced by using MTCNN face detection and face clustering. Inception-ResNet v1 model was used and test set accuracy was measured with respect to iterations. We compared our model with a commercial cloud-based celebrity recognition AI with which our celebrity database is thought to have about 26% in common. In the experiment, our model showed better performance in the precision.


Artificial intelligence (AI) is getting more and more attention in the media industry, especially for video analysis and metadata generation. Among the possible metadata generated by AI, object labels and background information are relatively easy to acquire, since there are open datasets and pre-trained models. On the other hand, it is difficult to implement an AI engine to generate ‘face’ (or ‘person’) metadata, due to the lack of datasets and pre-trained models fit for purpose. Furthermore, face datasets are required to be constructed locally, i.e. usually for each country.

As a media corporation, we decided to build a face dataset composed of about 3.6 million images of 6,690 subjects, focused especially on Korean celebrities. The celebrities were chosen from our content management system (CMS), in order of appearance counts in our contents. In order to efficiently construct the dataset and speed up the image labelling process, AI-based automation such as face detection and clustering is used.

We trained our dataset with Inception-ResNet v1 (5) as the backbone network and used softmax as the loss function. The proposed model is compared with a commercial celebrity recognition API provided as a cloud service. It is shown that the proposed model performs better in the experiments using our dataset.

Download the paper below