IEEE CVPR 2011
TechTalks from event: IEEE CVPR 2011
Note: Award talks and user-uploaded contents are accessible for free. Other oral sessions are to be accessed by only those who registered for the main conference or for the webcast/video-proceedings. You can register to view video proceeding by visiting CVPR 2011 website and following the virtual-registration link.
CVPR Award Papers
Recognition Using Visual PhrasesIn this paper we introduce visual phrases, complex visual composites like â€œa person riding a horseâ€. Visual phrases often display signi?cantly reduced visual complexity compared to their component objects, because the appearance of those objects can change profoundly when they participate in relations. We introduce a dataset suitable for phrasal recognition that uses familiar PASCAL object categories, and demonstrate signi?cant experimental gains resulting from exploiting visual phrases. We show that a visual phrase detector signi?cantly outperforms a baseline which detects component objects and reasons about relations, even though visual phrase training sets tend to be smaller than those for objects. We argue that any multi-class detection system must decode detector outputs to produce ?nal results; this is usually done with nonmaximum suppression. We describe a novel decoding procedure that can account accurately for local context without solving dif?cult inference problems. We show this decoding procedure outperforms the state of the art. Finally, we show that decoding a combination of phrasal and object detectors produces real improvements in detector results.
Separating Reflective and Fluorescent Components of An ImageColor plays a vitally important role in the world we live in. It surrounds us everywhere we go. Achromatic life, restricted to black, white and grey, is extremely dull. Color fascinates artists, for it adds enormously to aesthetic appreciation, directly invoking thoughts, emotions and feelings. Color fascinates scientists. For decades, scientists in color imaging, printing and digital photography have striven to satisfy increasing demands for accuracy in color reproduction. Fluorescence is a very common phenomenon observed in many objects such as gems and corals, writing paper, clothes, and even laundry detergent. Traditional color imaging algorithms exclude ?uorescence by assuming that all objects have only an ordinary re?ective component. The ?rst part of the thesis shows that the color appearance of an object with both re?ective and ?uorescent components can be represented as a linear combination of the two components. A linear model allows us to separate the two components using independent component analysis (ICA). We can then apply different algorithms to each component, and combine the results to form images with more accurate color. Displaying color images accurately is as important as reproducing color images accurately. The second part of the thesis presents a new, practical model for displaying color images on self-luminous displays such as LCD monitors. It shows that the model accounts for human visual systemâ€™s mixed adaptation condition and produces results comparable to many existing algorithms.
Discrete-Continuous Optimization for Large-scale Structure from MotionRecent work in structure from motion (SfM) has successfully built 3D models from large unstructured collections of images downloaded from the Internet. Most approaches use incremental algorithms that solve progressively larger bundle adjustment problems. These incremental techniques scale poorly as the number of images grows, and can drift or fall into bad local minima. We present an alternative formulation for SfM based on ?nding a coarse initial solution using a hybrid discrete-continuous optimization, and then improving that solution using bundle adjustment. The initial optimization step uses a discrete Markov random ?eld (MRF) formulation, coupled with a continuous LevenbergMarquardt re?nement. The formulation naturally incorporates various sources of information about both the cameras and the points, including noisy geotags and vanishing point estimates. We test our method on several large-scale photo collections, including one with measured camera positions, and show that it can produce models that are similar to or better than those produced with incremental bundle adjustment, but more robustly and in a fraction of the time.
Real-time Human Pose Recognition in Parts from Single Depth ImagesWe propose a new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the dif?cult pose estimation problem into a simpler per-pixel classi?cation problem. Our large and highly varied training dataset allows the classi?er to estimate body parts invariant to pose, body shape, clothing, etc. Finally we generate con?dence-scored 3D proposals of several body joints by reprojecting the classi?cation result and ?nding local modes. The system runs at 200 frames per second on consumer hardware. Our evaluation shows high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters. We achieve state of the art accuracy in our comparison with related work and demonstrate improved generalization over exact whole-skeleton nearest neighbor matching