Abstract

The usage of deep learning algorithms has resulted in significant progress in auto- matic speech recognition (ASR). The ASR models may require over a thousand hours of speech data to accurately recognize the speech. There have been case studies that have indicated that there are certain factors like noise, acoustic distorting conditions, and voice quality that has affected the performance of speech recognition. In this research, we investigate the impact of noise on Automatic Speech Recognition and explore novel methods for developing noise-robust ASR models using the Tamil lan- guage dataset with limited resources. We are using the speech dataset provided by SpeechOcean.com and Microsoft for the Indian languages. We add several kinds of noise to the dataset and find out how these noises impact the ASR performance. We also determine whether certain data augmentation methods like raw data augmen- tation and spectrogram augmentation (SpecAugment) are better suited to different types of noises. Our results show that all noises, regardless of the type, had an impact on ASR performance, and upgrading the architecture alone were unable to mitigate the impact of noise. Raw data augmentation enhances ASR performance on both clean data and noise-mixed data, however, this was not the case with SpecAugment on the same test sets. As a result, raw data augmentation performs way better than SpecAugment over the baseline models.

Publication Date

4-2022

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Engineering (MS)

Department, Program, or Center

Computer Engineering (KGCOE)

Advisor

Emily Prud'hommeaux

Advisor/Committee Member

Alexander Loui

Advisor/Committee Member

Andres Kwasinski

Campus

RIT – Main Campus

Share

COinS