Abstract

Deep neural networks train millions of parameters to achieve state-of-the-art performance on a wide foray of applications. However, finding a global minimum with gradient descent approaches leads to lengthy training times coupled with high computational resource requirements. To alleviate these concerns, the idea of fixed-random weights in deep neural networks is explored. More critically the goal is to maintain performance akin to fully trained models.

Metrics such as floating point operations per second and memory size are compared and contrasted for fixed-random and fully trained models. Additional analysis on downsized models that mimic the number of trained parameters of the fixed-random models, shows that fixed-random weights enable slightly higher performance. In a fixed-random convolutional model, ResNet achieves ∼57% image classification accuracy on CIFAR-10. In contrast, a DenseNet architecture, with only fixed-random filters in the convolutional layers, achieves ∼88% accuracy for the same task. DenseNet’s fully trained model achieves ∼96% accuracy, which highlights the importance of architectural choice for a high performing model.

To further understand the role of architectures, random projection networks trained using a least squares approximation learning rule are studied. In these networks, deep random projection layers and skipped connections are exploited as they are shown to boost the overall network performance. In several of the image classification experiments conducted, additional layers and skipped connectivity consistently outperform a baseline random projection network by 1% to 3%. To reduce the complexity of the models in general, a tensor decomposition technique, known as the Tensor-Train decomposition, is leveraged. The compression of the fully-connected hidden layer leads to a minimum ∼40x decrease in memory size at a slight cost in resource utilization. This research study helps to gain a better understanding of how random filters and weights can be utilized to obtain lighter models.

Publication Date

6-2019

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Engineering (MS)

Department, Program, or Center

Computer Engineering (KGCOE)

Advisor

Dhireesha Kudithipudi

Advisor/Committee Member

Cory Merkel

Advisor/Committee Member

Raymond Ptucha

Comments

This thesis has been embargoed. The full-text will be available on or around 8/26/2020.

Campus

RIT – Main Campus

Available for download on Tuesday, August 25, 2020

Share

COinS