The usage of deep learning algorithms has resulted in significant progress in auto- matic speech recognition (ASR). The ASR models may require over a thousand hours of speech data to accurately recognize the speech. There have been case studies that have indicated that there are certain factors like noise, acoustic distorting conditions, and voice quality that has affected the performance of speech recognition. In this research, we investigate the impact of noise on Automatic Speech Recognition and explore novel methods for developing noise-robust ASR models using the Tamil lan- guage dataset with limited resources. We are using the speech dataset provided by SpeechOcean.com and Microsoft for the Indian languages. We add several kinds of noise to the dataset and find out how these noises impact the ASR performance. We also determine whether certain data augmentation methods like raw data augmen- tation and spectrogram augmentation (SpecAugment) are better suited to different types of noises. Our results show that all noises, regardless of the type, had an impact on ASR performance, and upgrading the architecture alone were unable to mitigate the impact of noise. Raw data augmentation enhances ASR performance on both clean data and noise-mixed data, however, this was not the case with SpecAugment on the same test sets. As a result, raw data augmentation performs way better than SpecAugment over the baseline models.
Computer Engineering (MS)
Department, Program, or Center
Computer Engineering (KGCOE)
Lakshminarayanan, Vigneshwar, "Impact of Noise in Automatic Speech Recognition for Low-Resourced Languages" (2022). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus