|
|
I am a Computer Vision researcher with main interests in 3D vision and its applications in Computer Graphics. I am currently employed as a lecturer at Aston University in Birmingham, UK. Prior to that I was a Senior Research Scientist in Toshiba Research Laboratory in Cambridge, UK where I was a member of the Computer Vision Group. Even earlier, I was a PhD student in in the Machine Intelligence Laboratory in the Department of Engineering, University of Cambridge. Many years ago as an undergraduate I studied Mathematics and Computer Science at Imperial College, London. I also currently hold a junior research fellowship in Wolfson College, Cambridge. My research is focused on algorithms for obtaining 3D models of objects from visual data such as digital photographs or video. One motivation for this research is that visual data is very easy to aquire, typically much easier than other 3D sensors. Most importantly however, 3D models that have been obtained from visual cues are, by construction, visually faithful to the real world objects they are modelling. Therefore, these techniques offer our best chance of bridging the gap between the real world and the virtual worlds of Computer Graphics, eventually leading to photorealistic 3D content in both films and computer games. |
|
A Generative Model for Online Depth Fusion
In this project we looked at the problem of fusing depth-map measurements probabilistically. The results show our method outperforming competitors in some regimes, especially under heavy noise/outlier measurements. However the key merit of the approach is the principled variational Bayesian framework which shows great promise and paves the way for more complex models. More on this soon! The paper describes a probabilistic, online, depth map fusion frame-work, whose generative model for the sensor measurement process accurately incorporates both long-range visibility constraints and a spatially varying, probabilistic outlier model. In addition, we propose an inference algorithm that updates the state variables of this model in linear time each frame. Our detailed evaluation compares our approach against several others, demonstrating and explaining the improvements that this model offers as well as highlighting a problem with all current methods: systemic bias. Check out the supplementary video showing the system in action here
|
|
|
Video-based, Real-Time Multi View Stereo
We investigate the problem of obtaining a dense reconstruction in real-time, from a live video stream. In recent years, Multi-view stereo (MVS) has received considerable attention and a number of methods have been proposed. However, most methods operate under the assumption of a relatively sparse set of still images as input and unlimited computation time. Video based MVS has received less attention despite the fact that video sequences offer significant benefits in terms of usability of MVS systems. In this paper we propose a novel video based MVS algorithm that is suitable for real-time, interactive 3d modeling with a hand-held camera. The key idea is a per-pixel, probabilistic depth estimation scheme that updates posterior depth distributions with every new frame. The current implementation is capable of updating 15 million distributions per second. We evaluate the proposed method against the state-of-the-art real-time MVS method and show improvement in terms of accuracy. Check out some videos of the system in action here
|
|
|
Automatic Object Segmentation from Calibrated Images
This paper addresses the problem of automatically obtaining the object/background segmentation of a rigid 3D object observed in a set of images that have been calibrated for camera pose and intrinsics. Such segmentations can be used to obtain a shape representation of a potentially texture-less object by computing a visual hull. We propose an automatic approach where the object to be segmented is identified by the pose of the cameras instead of user input such as 2D bounding rectangles or brush-strokes. The key behind our method is a pairwise MRF framework that combines (a) foreground/background appearance models, (b) epipolar constraints and (c) weak stereo correspondence into a single segmentation cost function that can be efficiently solved by Graph-cuts. The segmentation thus obtained is further improved using silhouette coherency and then used to update the foreground/background appearance models which are fed into the next Graph-cut computation. These two steps are iterated until segmentation convergences. Our method can automatically provide a 3D surface representation even in texture-less scenes where MVS methods might fail. Furthermore, it confers improved performance in images where the object is not readily separable from the background in colour space, an area that previous segmentation approaches have found challenging. Paper was awarded the BBC Best Paper Prize!
|
|
|
Self-calibrated, multi-spectral photometric stereo for 3d face capture
This paper addresses the problem of obtaining 3d detailed reconstructions of human faces in real-time and with inexpensive hardware. We present an algorithm based on a monocular multi-spectral photometric-stereo setup. This system is known to capture high-detailed deforming 3d surfaces at high frame rates and without having to use any expensive hardware or synchronized light stage. However, the main challenge of such a setup is the calibration stage, which depends on the lights setup and how they interact with the specific material being captured, in this case, human faces. For this purpose we develop a self-calibration technique where the person being captured is asked to perform a rigid motion in front of the camera, maintaining a neutral expression. Rigidity constrains are then used to compute the head.s motion with a structure-from-motion algorithm. Once the motion is obtained, a multi-view stereo algorithm reconstructs a coarse 3d model of the face. This coarse model is then used to estimate the lighting parameters with a robust estimator which allows for detailed realtime 3d capture of faces. The calibration procedure is validated with two real sequences. The previous 3DPVT'10 version can be found here The capture system is identical to the one we presented in ECCV'08. The main difference is the clever calibration method for photometric stereo that was inspired from our earlier CVPR'06 work. Check out the supplementary video here
|
|
|
Using Multiple Hypotheses to Improve Depth-Maps for Multi-View Stereo
This paper proposes an improvement to a large class of Multi-View Stereo algorithms that fuse stereo depth maps. We show that if individual depth-maps are filtered for outliers prior to the fusion stage, good performance can be maintained in sparse data-sets. Our strategy is to collect a list of good hypotheses for the depth of each pixel. We then chose the optimal depth for each pixel by enforcing consistency between neighbouring pixels in a depth-map. A crucial element of the fitering stage is the introduction of a possible unknown depth hypothesis for each pixel, which is selected by the algorithm when no consistent depth can be chosen. This pre-processing of the depth-maps allows the global fusion stage to operate on fewer outliers and consequently improve the performance under sparsity of data. |
|
|
Shadows in Three-Source Photometric Stereo
Shadows present a significant challenge for Photometric Stereo methods. When four or more images are available, local surface orientation is overdetermined and the shadowed pixels can be discarded. In this paper we look at the challenging case when only three images under three different illuminations are available. In this case, when one of the three pixel intensity constraints is missing due to shadow, a 1 dof ambiguity per pixel arises. We show that using integrability one can resolve this ambiguity and use the remaining two constraints to reconstruct the geometry in the shadow regions. As the problem becomes ill-posed in the presence of noise, we describe a regularization scheme that improves the numerical performance of the algorithm while preserving data. |
|
|
Non-rigid Photometric Stereo with Colored Lights
When three lights illuminate a surface from three different angles and with three different colors, there is a one-to-one mapping between the RGB color measured by a camera and the surface orientation. If we illuminate a complex object under this setup, we can invert the mapping to get surface orientations from an RGB image, then integrate those to get a depth-map. In this paper, this idea, previously used only with static objects, is applied to the reconstruction of a deforming object, such as a moving cloth. We capture color videos of complex motions of fabrics, from which we extract sequences of depth maps. We propose a simple scheme with which these depth maps can be registered to a canonical pose and this allows complex applications such as texture mapping or avatar skinning. A video showing the system in action can be found in the following links: short avi, longer version and from YouTube part 1 and part 2. |
|
|
Multi-view Photometric Stereo
This is an extension and consolidation of our CVPR 2006 work on multi-view photometric stereo. The main difference of this paper is that significant albedo variations in the surface of the reconstructed object can be tolerated. In the case where albedo variation is present on the object, we can usually obtain reconstructions with classic multi-view dense stereo. We show however, how our work produces results of much higher geometric detail than multi-view stereo, by exploiting the change in illumination. An earlier version of this work appeared in Lighting-up geometry:
accurate 3D modelling of museum artifacts with a torch and a camera
|
|
|
Automatic 3D Object Segmentation in Multiple
Views using Volumetric Graph-Cuts
Many multi-view stereo methods are faced with the problem of segmenting 20-100 calibrated images of a 3D object. These segmentations are used to create a visual hull which is a first approximation to the object's geometry. In this paper we propose a simple technique for automatically segmenting these images. Our idea is based on two observations: (1) In each image the camera will usually fixate on the object of interest and (2) the segmentations are not independent because of the silhouette coherence constraint. We use (1) to initalise an object color model. We then perform a series of simultaneous segmentations using (2). In each iteration we update the color model based on previous results. The process converges to the correct segmentations after just a few iterations. |
|
|
Multi-view Stereo via Volumetric Graph-cuts
and Occlusion Robust Photo-Consistency
Here we revisit our CVPR 2005 work and develop a much improved formulation. The object surface is defined as a partition of 3D space into 'inside' and 'outside' regions. The cost functional, which we optimise using Graph-cuts, is a combination of a simple balooning force and an occlusion-independent Normalised Cross Correlation cost. The advantages of our approach are the following: (1) Objects of arbitrary topology can be fully represented and computed as a single surface with no self-intersections. (2) The representation and geometric regularisation is image and viewpoint independent. (3) Global optimisation is computationally tractable, using existing max-flow algorithms. |
|
|
Probabilistic visibility for multi-view stereo
In this work we explore how the photo-consistency criterion can be used to obtain a likelihood for a given 3D location being 'empty'. We observe that if a 3D point is considered as photo-consistent from a certain camera, then all 3D locations between that point and the camera are likely to be empty. In fact the degree of likelihood for 'emptiness' is related to the degree of photo-consistency. We formalise this observation probabilistically and show how it can be used to reconstruct difficult concavities in objects. |
|
|
Reconstruction in
the Round Using Photometric Normals and Silhouettes
In this work we have obtained full 3D reconstructions of single-albedo, near-Lambertian objects such as white porcelain from 36 views under changing but unknown lighting (single distant light-source assumed). This work is the first to generalise uncalibrated photometric stereo in the multi-view setting. For a more detailed look at some other models we reconstructed using this technique, look here. All you will need is a java enabled browser. |
|
|
Using Frontier
Points to Recover Shape, Reflectance and Illumination
Frontier Points are a robust geometrical feature extracted from the silhouettes. They are points on the surface of the object with a known 3D location and known local surface orientation. In this paper we have shown how they can be used to recover information about the surface reflectance of the object as well as illumination. |
![]() |
|
Multi-view
stereo via Volumetric Graph-cuts
The object surface is defined as a boundary separating the Visual Hull surface from an inner surface at a constant offset from and inside the Visual Hull. The volume between these two surfaces is discretized into voxels and for each voxel we compute a photo-consistency cost. Using Graph-Cuts and a specially defined weighted graph, we compute the surface that optimally separates voxels inside and outside the scene. |
|
|
Reconstructing
Relief Surfaces
The object surface is defined as a height field on top of a coarse approximation of the scene surface (typically the visual hull). The height field is formulated as a Markov Random Field incorporating photo-consistency and surface smoothness constraints. The resulting cost function is optimized using Loopy Belief Propagation. |
|
2002-2006, PhD in Computer Vision from Trinity College, Cambridge
1997-2001, MSci in Mathematics and Computer Science from Imperial College, London
My CV (pdf)
Tel: +44 (0) 121 204 3452
E-mail : g.vogiatzis [at] aston ac uk
Room MB213c,
Dept of Computer Science,
School of Engineering and Applied Science,
Aston University
Birmingham, B4 7ET