One of the key challenges of visual perception is to extract abstract models of 3D objects and object categories from visual measurements, which are affected by complex nuisance factors such as viewpoint, occlusion, motion, and deformations. In this talk, I will introduce two approaches that, given a large number of images of an object and no other supervision, can factorize image deformations and appearance. I will demonstrate the applicability of this method to articulated objects and deformable objects such as human faces and body by learning embeddings from random synthetic transformations or optical flow correspondences, all without any manual supervision. The talk will cover three recent recent papers:
[1] Thewlis, J., Bilen, H., & Vedaldi, A. (2017). Unsupervised learning of object landmarks by factorized spatial embeddings. In International Conference on Computer Vision (ICCV).
[2] Thewlis, J., Bilen, H., & Vedaldi, A. (2017). Unsupervised learning of object landmarks by factorized spatial embeddings. In Neural Information Processing Systems (NIPS).
[3] Jakab, T., Gupta, A., Bilen, H., & Vedaldi, A. (2018). Conditional Image Generation for Learning the Structure of Visual Objects. (NIPS).