
Completed
Vision, Language and Visual Retrieval
Multimodal methods connecting images and language: large-scale visual retrieval, semantic art understanding, and image caption generation.
Image and video understanding, geometry, and recognition.

Multimodal methods connecting images and language: large-scale visual retrieval, semantic art understanding, and image caption generation.

Learning to predict depth from single images without ground-truth supervision, with a focus on dynamic scenes and challenging conditions.

Recovering accurate 3D shape from collections of images, using multi-view stereo, volumetric graph-cuts, and probabilistic depth-map fusion.