Vision, Language and Visual Retrieval

Completed 2017–2020

Multimodal methods connecting images and language: large-scale visual retrieval, semantic art understanding, and image caption generation.

This project sits at the intersection of computer vision and natural language, building multimodal systems that connect images with text and that retrieve visual content at scale. The work spans semantic understanding, large-scale retrieval, and caption generation.

Representative directions include semantic art understanding — the SemArt dataset and Text2Art challenge, which move art analysis beyond style classification towards relating paintings to their textual commentary; asymmetric spatio-temporal embeddings for image-to-video retrieval, which learn features that match a still query against video collections; retrieval of fashion products from film and television footage; and deep models for automatically captioning news images, with applications from multimedia management to accessibility for visually impaired users.

Collaborators

Noa García Aston University
Vishwash Batra University of Warwick
Yulan He Systems Analytics Research Institute (SARI)
Aparajita Haldar University of Warwick
Hakan Ferhatosmanoğlu University of Warwick
Tanaya Guha University of Warwick

Related publications

Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise IllustrationVishwash Batra, Aparajita Haldar, Yulan He, Hakan Ferhatosmanoğlu, George Vogiatzis and Tanaya GuhaLecture notes in computer science · 2020
A Deep Learning Approach to Automatic Caption Generation for News ImagesVishwash Batra, Yulan He and George VogiatzisAston Publications Explorer (Aston University) · 2019
How to Read Paintings: Semantic Art Understanding with Multi-modal RetrievalNoa García and George VogiatzisLecture notes in computer science · 2019
Learning non-metric visual similarity for image retrievalNoa García and George VogiatzisImage and Vision Computing · 2019
Asymmetric Spatio-Temporal Embeddings for Large-Scale Image-to-Video RetrievalNoa García and George VogiatzisAston Publications Explorer (Aston University) · 2018
How to Read Paintings: Semantic Art Understanding with Multi-Modal RetrievalNoa García and George VogiatzisarXiv (Cornell University) · 2018
Neural Caption Generation for News ImagesVishwash Batra, Yulan He and George Vogiatzis2018
Dress Like a Star: Retrieving Fashion Products from VideosNoa García and George Vogiatzis2017

← All projects