Image classification and object recognition are some of the most prominent problems in computer vision. The difficult nature of finding objects regardless of pose and occlusions requires a large number of compute resources. Recent advancements in technology have made great strides towards solving this problem, and in particular, deep learning has revolutionized this field in the last few years.
The classification of large datasets, such as the popular ImageNet dataset, requires a network with millions of weights. Learning each of these weights using back propagation requires a compute intensive training phase with many training samples. Recent compute technology has proven adept at classifying 1000 classes, but it is not clear if computers will be able to differentiate and classify the more than 40,000 classes humans are capable of doing. The goal of this thesis is to train computers to attain human-like performance on large-class datasets. Specifically, we introduce two types of hierarchical architectures: Late Fusion and Early Fusion. These architectures will be used to classify datasets with up to 1000 objects, while simultaneously reducing both the number of computations and training time. These hierarchical architectures maintain discriminative relationships amongst networks within each layer as well as an abstract relationship from one layer to the next. The resulting framework reduces the individual network sizes, and thus the total number of parameters that need to be learned. The smaller number of parameters results in decreased training time.
Library of Congress Subject Headings
Classification--Data processing; Machine learning; Image processing--Digital techniques; Optical pattern recognition
Computer Engineering (MS)
Department, Program, or Center
Computer Engineering (KGCOE)
Nooka, Sai Prasad, "Fusion of Mini-Deep Nets" (2016). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus