Abstract

The application of deep neural networks to the task of acoustic modeling for automatic speech recognition (ASR) has resulted in dramatic decreases of word error rates, allowing for the use of this technology in smart phones and personal home assistants in high-resource languages. Developing ASR models of this caliber, however, requires hundreds or thousands of hours of transcribed speech recordings, which presents challenges for most of the world’s languages. In this work, we investigate the applicability of three distinct architectures that have previously been used for ASR in languages with limited training resources. We tested these architectures using publicly available ASR datasets for several typologically and orthographically diverse languages, whose data was produced under a variety of conditions using different speech collection strategies, practices, and equipment. Additionally, we performed data augmentation on this audio, such that the amount of data could increase nearly tenfold, synthetically creating higher resource training. The architectures and their individual components were modified, and parameters explored such that we might find a best-fit combination of features and modeling schemas to fit a specific language morphology. Our results point to the importance of considering language-specific and corpus-specific factors and experimenting with multiple approaches when developing ASR systems for resource-constrained languages.

Library of Congress Subject Headings

Automatic speech recognition--Technological innovations; Machine learning; Neural networks (Computer science); Pattern recognition systems; Grammar, Comparative and general--Morphology

Publication Date

4-2021

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Engineering (MS)

Department, Program, or Center

Computer Engineering (KGCOE)

Advisor

Emily Prud'hommeaux

Advisor/Committee Member

Alexander Loui

Advisor/Committee Member

Andreas Savakis

Recommended Citation

Morris, Ethan, "Automatic Speech Recognition for Low-Resource and Morphologically Complex Languages" (2021). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10758

Campus

RIT – Main Campus

Plan Codes

CMPE-MS

Download

COinS

Theses

Automatic Speech Recognition for Low-Resource and Morphologically Complex Languages

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

Automatic Speech Recognition for Low-Resource and Morphologically Complex Languages

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links