There have been massive improvements in the field of computer vision in recent years, mostly due to the efficacy of deep learning. Most if not all of these tasks involve use of supervised learning. However, building a labelled dataset of sufficient volume can be prohibitively expensive. This is where transfer learning is applied, where the predictive models are trained on a related pretext task(most commonly using supervised learning) before being fine-tuned on the target task. This reduces the volume of data required for the target task but we are still bound by the volume of labelled data available for the pre-training task. Unsupervised transfer learning by removing the need for labelled data would allow any image to be used as a datapoint for a computer vision task. Autoencoders are a class of unsupervised learning models. Autoencoders trained on image reconstruction can be fine-tuned on a target task such as image classifcation and semantic segmentation to perform unsupervised transfer learning. This thesis seeks to test the efficacy of using autoencoders (image reconstruction) as a pre-training task against other unsupervised tasks. It aims to employ state of the art findings on autoencoders as well as apply modifcations on top of it to maximize unsupervised transfer learning performance. When tested on image classiffication and semantic segmentation it shows 3% to 26% performance improvement depending on the model architecture.
Reddy, Siddhant, "Unsupervised Transfer Learning with Autoencoders" (2022). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus