This paper presents a multi-camera system that performs face detection and pose estimation in real-time and may be used for intelligent computing within a visual sensor network for surveillance or humancomputer interaction. The system consists of a Scene View Camera (SVC), which operates at a fixed zoom level, and an Object View Camera (OVC), which continuously adjusts its zoom level to match objects of interest. The SVC is set to survey the whole filed of view. Once a region has been identified by the SVC as a potential object of interest, e.g. a face, the OVC zooms in to locate specific features. In this system, face candidate regions are selected based on skin color and face detection is accomplished using a Support Vector Machine classifier. The locations of the eyes and mouth are detected inside the face region using neural network feature detectors. Pose estimation is performed based on a geometrical model, where the head is modeled as a spherical object that rotates upon the vertical axis. The triangle formed by the mouth and eyes defines a vertical plane that intersects the head sphere. By projecting the eyes-mouth triangle onto a two dimensional viewing plane, equations were obtained that describe the change in its angles as the yaw pose angle increases. These equations are then combined and used for efficient pose estimation. The system achieves real-time performance for live video input. Testing results assessing system performance are presented for both still images and video.
Date of creation, presentation, or exhibit
Department, Program, or Center
Computer Engineering (KGCOE)
Savakis, Andreas; Erhard, Matthew; Schimmel, James; and Hnatow, Justin, "A Multi-camera system for a real time pose estimation" (2007). Accessed from
RIT – Main Campus