Parts-based object detection using multiple views

David Robert Higgs

Physical copy available from RIT's Wallace Library at TA1650 .H44 2005

Abstract

One of the most important problems in image understanding is robust object detection. Small changes in object appearance due to illumination, viewpoint, and occlusion can drastically change the performance of many object detection methods. Non-rigid object can be even more difficult to reliably detect.

The unique contribution of this thesis was to extend the approach of parts-based object detection to include support for multiple viewing angles. Bayesian networks were used to integrate the parts detection of each view in a flexible manner, so that the experimental performance of each part detector could be incorporated into the decision. The detectors were implemented using neural networks trained using the bootstrapping method of repeated backpropagation, where false-positives are introduced to the training set as negative examples. The Bayesian networks were trained with a separate dataset to gauge the performance of each part detector. The final decision of object detection system was made with a logical OR operation.

The domain of human face detection was used to demonstrate the power of this approach. The FERET human face database was selected to provide both training and testing images; a frontal and a side view were chosen from the available poses. Part detectors were trained on four features from each view - the right and left eyes, the nose, and the mouth. The individual part detection rates ranged from 85% to 95% against testing images. Cross-validation was used to test the system as a whole, giving average view detection rates of 96.7% and 97.2% respectively for the frontal and side views, and an overall face detection rate of 96.9% amongst true-positive images. A 5.7% false-positive rate was demonstrated against background clutter images. These results compare favorably with existing methods, but provide the additional benefit of face detection at different view angles.