Multistage Model for Robust Face Alignment Using Deep Neural Networks

Huabin Wang1, Rui Cheng1, Jian Zhou1, Liang Tao1, Hon Keung Kwan2

1Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei, China

2Department of Electrical and Computer Engineering, University of Windsor, Windsor, ON N9B 3P4, Canada

Summary

An ability to generalize unconstrained conditions such as severe occlusions and large pose variations remains a challenging goal to achieve in face alignment. In this demo, a multistage model based on deep neural networks is presented which takes advantages of spatial transformer networks, hourglass networks and exemplar-based shape constraints. First, a spatial transformer - generative adversarial network which consists of convolutional layers and residual units is utilized to solve the initialization issues caused by face detectors, such as rotation and scale variations, to obtain improved face bounding boxes for face alignment. Then, stacked hourglass network is employed to obtain preliminary locations of landmarks as well as their corresponding scores. In addition, an exemplar-based shape dictionary is designed to determine landmarks with low scores based on those with high scores. By incorporating face shape constraints, misaligned landmarks caused by occlusions or cluttered backgrounds can be considerably improved. Extensive experiments based on challenging benchmark datasets are performed to demonstrate the superior performance of the proposed method over other state-of-the-art methods.

Highlights

1) A spatial transformer - generative adversarial network is proposed to produce promising initial face images for face alignment.
2) Based on the intensity of the heatmaps obtained by a two stage hourglass network, a scoring scheme is designed to measure the quality of predicted landmarks locations, which can estimate the occlusion level of each landmark and distinguish the aligned landmarks from misaligned landmarks.
3) An exemplar-based shape dictionary is employed to impose geometric constraints. The landmarks with high scores are used to search similar shapes from dictionary, and the landmarks with low scores are refined by shape reconstruction using similar shapes.
4) Experiment results on several benchmark datasets (300-W, COFW and WFLW) show that our proposed multistage model outperforms most recent face alignment methods, especially for faces with difficult scenarios such as large pose, lighting and occlusion, etc.

Overview of the proposed multistage model (MSM)

Posting date: 4 February 2020.
Cite as: Huabin Wang, Rui Cheng, Jian Zhou, Liang Tao, and Hon Keung Kwan, "Multistage model for robust face alignment using deep neural networks," arXiv preprint arXiv:2002.01075.

Results

300-W dataset

COFW dataset

WFLW dataset

Collected from members of this laboratory