Abstract

Deep convolutional neural networks (CNNs) are effective and popularly used in a wide variety of computer vision tasks, especially in image classification. Conventionally, they consist of a series of convolutional and pooling layers followed by one or more fully connected (FC) layers to produce the final output in image classification tasks. This design descends from traditional image classification machine learning models which use engineered feature extractors followed by a classifier, before the widespread application of deep CNNs. While this has been successful, in models trained for classifying datasets with a large number of categories, the fully connected layers often account for a large percentage of the network's parameters. For applications with memory constraints, such as mobile devices and embedded platforms, this is not ideal. Recently, a family of architectures that involve replacing the learned fully connected output layer with a fixed layer has been proposed as a way to achieve better efficiency. This research examines this idea, extends it further and demonstrates that fixed classifiers offer no additional benefit compared to simply removing the output layer along with its parameters. It also reveals that the typical approach of having a fully connected final output layer is inefficient in terms of parameter count. This work shows that it is possible to remove the entire fully connected layers thus reducing the model size up to 75% in some scenarios, while only making a small sacrifice in terms of model classification accuracy. In most cases, this method can achieve comparable performance to a traditionally learned fully connected classification output layer on the ImageNet-1K, CIFAR-100, Stanford Cars-196, and Oxford Flowers-102 datasets, while not having a fully connected output layer at all. In addition to comparable performance, the method featured in this research also provides feature visualization of deep CNNs at no additional cost.

Library of Congress Subject Headings

Computer vision; Neural networks (Computer science); Convolutions (Mathematics); Image analysis; Classification--Data processing; Image processing--Digital techniques

Publication Date

4-27-2020

Document Type

Thesis

Student Type

Graduate

Degree Name

Imaging Science (MS)

Department, Program, or Center

Chester F. Carlson Center for Imaging Science (COS)

Advisor

Christopher Kanan

Advisor/Committee Member

Guoyu Lu

Advisor/Committee Member

Nathan Cahill

Campus

RIT – Main Campus

Plan Codes

IMGS-MS

Share

COinS