Understanding a scene from a single image, while effortless for
humans, remains a difficult computational challenge. Most of
today's approaches reduce the problem to one of 2D pattern
classification, where rectangular image patches are independently
compared to stored templates to produce isolated image labels. But
the real world is very much three-dimensional, with occlusion,
perspective effects, 3D shapes, lighting, and interactions between
objects within the 3D space of the scene. To make progress, we must
account for the three-dimensional nature of the real world. However,
computing quantitative 3D information from a single image may not be
feasible, or even desirable.
Instead, we advocate qualitative geometric reasoning in
terms of 3D spatial relationships between scene components, without
an explicit, quantitative 3D reconstruction. Our eventual aim is to
unify two disjoint schools of thought. For the traditional
"Geometry" school, an image is a 2D projection of a collection of
physical surfaces residing in 3-space, and the goal is to recover
the 3D structure of these surfaces using tools from projective
geometry. For the Pattern Recognition or "Appearance" school, an
image is just a 2D array of patterns and the goal is to recognize
familiar patterns as known objects using statistical
techniques. Both schools look at exactly the same data, but see
completely different things! We believe that the use of qualitative
geometric reasoning can unify these two views into a coherent
framework, where appearance and geometry co-exist and rely on each
other to produce a coherent interpretation of an image.
In this talk, I will discuss the issues involved in framing the 3D problem qualitatively, and present some of our recent results in this area.
Alexei (Alyosha) Efros is an associate professor at the Robotics Institute and the Computer Science Department at Carnegie Mellon University. His research is in the area of computer vision and computer graphics, especially at the intersection of the two. He is particularly interested in using data-driven techniques to tackle problems which are very hard to model parametrically but where large quantities of data are readily available. Alyosha received his PhD in 2003 from UC Berkeley and spent the following year as a post-doctoral fellow in Oxford, England. Alyosha is a recipient of CVPR Best Paper Award (2006), the NSF CAREER award (2006), the Sloan Fellowship (2008), the Guggenheim Fellowship (2008), the Okawa Grant (2008), and the Finmeccanica Career Development Chair (2010).
Creating visually rich 3D images has always been one of the major challenges of 3D graphics. Traditional methods rely heavily on manual modelling, texturing and lighting, requiring time-consuming and expensive effort from trained artists/modellers. We examine three ways to simplify this process: the use of images, generative models and simulation techniques. We first ask how much 3D information is really needed, and how much can be extracted from images for 3D graphics ? We discuss our attempts to answer this question involving volumetric approximations for trees, and rapid modeling and rendering using fast texture synthesis. Another key question is how to provide generative models allowing the easy creation of variety in 3D environments. We present one solution we recently developed involving procedural texturing techniques. Finally, we address the hard problem of relighting photographs which already contain a given lighting condition, using simulation-based approaches. We will conclude with an investigation of future research directions.
Dr. George Drettakis (PhD, Un. Toronto 1994) leads the research group REVES (Rendering for Virtual Environments with Sound), at INRIA Sophia-Antipolis. He has published extensively in computer graphics and 3D sound, in particular on global illumination, visibility, relighting and interactive visual and audio rendering at major international conferences and journals. His current research interests include image-based techniques for rendering and relighting, procedural and parametric texture synthesis and interaction and perception for graphics. He has been involved in several EU-funded projects and coordinated the EU FET Open CROSSMOD project (2005-2008) on how crossmodal audiovisual perception can improve virtual environments. He has participated in program committees of many international conferences, and co-chaired the program committees of EGWR '98, Eurographics 2002 and 2008. He is papers chair for ACM SIGGRAPH Asia 2010 and an associate editor of IEEE TVCG. He won the Eurographics Outstanding Technical Achievement Award in 2007.
The explosion of imagery available on the Internet has opened up a host of new applications in computer vision, image-based modeling, and image-based rendering. It is now possible to automatically reconstruct 3D models of heavily photographed scenes and objects such as tourist locations, and to recognize these in novel images such as cell phone queries. In this talk, I survey some of our work in this field, starting with our Photo Tourism image-based modeling and navigation system, and then discussing the complexity issues (and solutions) engendered by the huge scale of these datasets. I also discuss our work in both interactive and automated 3D modeling, with a particular emphasis on architectural reconstruction.
Richard Szeliski is a Principal Researcher at Microsoft Research, where he leads the Interactive Visual Media Group. He is also an Affiliate Professor at the University of Washington, and is a Fellow of the ACM and IEEE. Dr. Szeliski pioneered the field of Bayesian methods for computer vision, as well as image-based modeling, image- based rendering, and computational photography, which lie at the intersection of computer vision and computer graphics. His most recent research on Photo Tourism and Photosynth is an exciting example of the promise of large-scale image-based rendering.
Dr. Szeliski received his Ph.D. degree in Computer Science from Carnegie Mellon University, Pittsburgh, in 1988 and joined Microsoft Research in 1995. Prior to Microsoft, he worked at Bell-Northern Research, Schlumberger Palo Alto Research, the Artificial Intelligence Center of SRI International, and the Cambridge Research Lab of Digital Equipment Corporation. He has published over 150 research papers in computer vision, computer graphics, medical imaging, neural nets, and numerical analysis, as well as the book Bayesian Modeling of Uncertainty in Low-Level Vision. He was a Program Committee Chair for ICCV'2001 and the 1999 Vision Algorithms Workshop, served as an Associate Editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence and on the Editorial Board of the International Journal of Computer Vision, and is a Founding Editor of Foundations and Trends in Computer Graphics and Vision.