Domain Adaptation (DA) techniques aim to overcome the domain shift between a source domain used for training and the target domain used for testing and deployment. Domain adaptation methods assume the entire target domain is accessible during the adaptation process. We use efficient architectures in the continual data-constrained DA paradigm where the unlabeled data in the target domain is received continually in batches. In recent years, Vision Transformers have emerged as an alternative to traditional Convolutional Neural Networks (CNNs) as the feature extraction backbone for image classification and other computer vision tasks. Within the field of DA, these attention-based architectures have proven to be more powerful than their traditional counterparts. However, they possess a larger computational overhead due to their model size. We design a novel framework, called Continual Domain Adaptation through Knowledge Distillation (CAKE), that uses knowledge distillation (KD) to transfer to a CNN the more complex Vision Transformer's knowledge. By doing so, CNN-based adaptation obtains a similar performance to transformer-based adaptation while reducing the computational overhead. With this framework, we selectively choose samples from these batches to store in a buffer for selective replay. We mix the samples from the buffer with the incoming samples to incrementally update and adapt our model. We show that distilling to a smaller network after adapting a larger model allows the smaller network to achieve better accuracy than if the smaller network adapted to the target domain. We also demonstrate that CAKE outperforms state-of-the-art unsupervised domain adaptation methods without full access to the target domain or any access to the source domain.
Library of Congress Subject Headings
Transfer learning (Machine learning); Neural networks (Computer science); Convolutions (Mathematics)
Computer Engineering (MS)
Department, Program, or Center
Computer Engineering (KGCOE)
Thomas, Georgi, "Continual Domain Adaptation through Knowledge Distillation" (2023). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus