Audio-Visual Gaze Control

This project introduces a novel neural network-based reinforcement learning approach for robot gaze control. Our approach enables a robot to learn and to adapt its gaze control strategy for human-robot interaction neither with the use of external sensors nor with human supervision. The robot learns to focus its attention onto groups of people from its own audio-visual experiences, independently of the number of people, of their positions and of their physical appearances. In particular, we use a recurrent neural network architecture in combination with Q-learning to find an optimal action-selection policy; we pre-train the network using a simulated environment that mimics realistic scenarios that involve speaking/silent participants, thus avoiding the need of tedious sessions of a robot interacting with people. Our experimental evaluation suggests that the proposed method is robust against parameter estimation, i.e. the parameter values yielded by the method do not have a decisive impact on the performance. The best results are obtained when both audio and visual information is jointly used. Experiments with the Nao robot indicate that our framework is a step forward towards the autonomous learning of socially acceptable gaze behavior.

  • Deep Reinforcement Learning for Audio-Visual Gaze Control, Stéphane Lathuilière, Benoit Massé, Pablo Mesejo, Radu Horaud, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
  • Neural network based reinforcement learning for audio–visual gaze control in human–robot interaction, Stéphane Lathuilière, Benoît Massé, Pablo Mesejo, Radu Horaud, 2019 Pattern Recognition Letters