The goal of scene classification is to automatically assign a scene image to a semantic category (i.e. "building" or "river") based on analyzing the visual contents of this image. This is a challenging problem due to the scene images' variability, ambiguity, and a wide range of illumination or scale conditions that may apply. On the contrary, it is a fundamental problem in computer vision and can be used to guide other processes such as image browsing, contentbased image retrieval and object recognition by providing contextual information. This thesis implemented two scene classification systems: one is based on Spatial Pyramid Matching (SPM) and the other one is applying Hierarchical Dirichlet Processes (HDP). Both approaches are based on the most popular "bag-of-words" representation, which is a histogram of quantized visual features. SPM represents an image as a "spatial pyramid" which is produced by computing histograms of local features for multiple levels with different resolutions. "Spatial Pyramid Matching" is then used to estimate the overall perceptual similarity between images which can be used as a support vector machine (SVM) kernel. In the second approach, HDP is used to model the "bag-of-words" representations of images; each image is described as a mixture of latent themes and each theme is described as a mixture of words. The number of themes is automatically inferred from data. The themes are shared by images not only inside one scene category but also across all categories. Both systems are tested on three popular datasets from the field and their performances are compared. In addition, the two approaches are combined, resulting in performance improvement over either separate system.
Computer Science (MS)
Department, Program, or Center
Computer Science (GCCIS)
Yin, Haohui, "Scene classification using spatial pyramid matching and hierarchical Dirichlet processes" (2010). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus