Learning-based methods for appearance-based gaze estimation achieve state-of-the-art performance in challenging real-world settings but require large amounts of labelled training data. Learning-by-synthesis was proposed as a promising solution to this problem but current methods are limited with respect to speed, appearance variability, and the head pose and gaze angle distribution they can synthesize.
We present UnityEyes – a novel method to rapidly synthesize large amounts of variable eye region images as training data. Our method combines a novel generative 3D model of the human eye region with a real-time rendering framework. The model is based on high-resolution 3D face scans and uses real-time approximations for complex eyeball materials and structures as well as anatomically inspired procedural geometry methods for eyelid animation. We show that these synthesized images can be used to estimate gaze in difficult in-the-wild scenarios, even for extreme gaze angles or in cases in which the pupil is fully occluded. We also demonstrate competitive gaze estimation results on a benchmark in-the-wild dataset, despite only using a light-weight nearest-neighbor algorithm. We are making our UnityEyes synthesis framework available online for the benefit of the research community.
For more information on how to use the UnityEyes application, please see the tutorial page.
@inproceedings{wood2016_etra, title = {Learning an Appearance-Based Gaze Estimator from One Million Synthesised Images}, author = {Wood, Erroll and Baltru{\v{s}}aitis, Tadas and Morency, Louis-Philippe and Robinson, Peter and Bulling, Andreas}, pages = {131--138}, year = {2016}, booktitle = {Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications} }