Expert object recognition in video

Matt McEuen

Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works in October 2013.

Abstract

A recent computer vision technique for object classification in still images is the biologically-inspired Expert Object Recognition (EOR). This thesis adapts and extends the EOR approach for use with segmented video data. Properties of this data, such as segmentation masks and the visibility of an object over multiple frames, are exploited to decrease human supervision and increase accuracy. Several types of runtime learning are facilitated: class-level learning in which object types that are not included in the training set are given artifcial classes; viewpoint-level learning in which novel views of training objects are associated with existing classes; and instance-level learning of images that are somewhat similar to training images. The architecture of EOR, consisting of feature extraction, clustering, and cluster-specific principal component analysis, is retained. However, the K-means clustering algorithm used in EOR is replaced in this system by an augmented version of Fuzy K-means. This algorithm is incrementally run over the lifetime of the system, and automatically determines an appropriate number of partitions based on the data in memory and on a system parameter. In addition, the edge and line-based feature extraction of EOR is replaced with a global application of the principal component analysis, which increases accuracy when used with segmented video data. Classification output for the system consists of a multi-class hypothesis for each tracked object, from which a single-class "hard" hypothesis may be determined. The system, named VEOR (video expert object recognition), is designed for and tested with noisy, automatically segmented real-world data, consisting of both videos and still images of vehicle (car, pickup truck, and van) profiles.