oai:arXiv.org:2407.05357
Computer Science
2024
10/23/2024
Deep learning has been impressively successful in the last decade in predicting human head poses from monocular images.
However, for in-the-wild inputs the research community relies predominantly on a single training set, 300W-LP, of semisynthetic nature without many alternatives.
This paper focuses on gradual extension and improvement of the data to explore the performance achievable with augmentation and synthesis strategies further.
Modeling-wise a novel multitask head/loss design which includes uncertainty estimation is proposed.
Overall, the thus obtained models are small, efficient, suitable for full 6 DoF pose estimation, and exhibit very competitive accuracy.
;Comment: CVPR version.
Added evaluation on BIWI.
Plenty of writing changes
Welter, Michael, 2024, On the power of data augmentation for head pose estimation