Abstract

Speaker recognition is a frequently overlooked form of biometric security. Text-independent speaker identification is used by financial services, forensic experts, and human computer interaction developers to extract information that is transmitted along with a spoken message such as identity, gender, age, emotional state, etc. of a speaker. Speech features are classified as either low-level or high-level characteristics. Highlevel speech features are associated with syntax, dialect, and the overall meaning of a spoken message. In contrast, low-level features such as pitch, and phonemic spectra are associated much more with the physiology of the human vocal tract. It is these lowlevel features that are also the easiest and least computationally intensive characteristics of speech to extract. Once extracted, modern speaker recognition systems attempt to fit these features best to statistical classification models. One such widely used model is the Gaussian Mixture Model (GMM). The current standard of testing of speaker recognition systems is standardized by NIST in the often updated NIST Speaker Recognition Evaluation (NIST-SRE) standard. The results measured by the tests outlined in the standard are ultimately presented as Detection Error Tradeoff (DET) curves and detection cost function scores. A new method of measuring the effects of channel impediments on the quality of identifications made by Gaussian Mixture Model based speaker recognition systems will be presented in this thesis. With the exception of the NIST-SRE, no standardized or extensive testing of speaker recognition systems in noisy channels has been conducted. Thorough testing of speaker recognition systems will be conducted in channel model simulators. Additionally, the NIST-SRE error metric will be evaluated against a new proposed metric for gauging the performance and improvements of speaker recognition systems.

Library of Congress Subject Headings

Automatic speech recognition; Electronic noise

Publication Date

3-1-2012

Document Type

Thesis

Department, Program, or Center

Computer Engineering (KGCOE)

Advisor

Melton, Roy

Advisor/Committee Member

Shaaban, Muhammad

Comments

Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works. Physical copy available through RIT's The Wallace Library at: TK7895.S65 G44 2012

Recommended Citation

Ghilduta, Robert, "Characterization of speaker recognition in noisy channels" (2012). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/3210

Campus

RIT – Main Campus

Download

COinS

Theses

Characterization of speaker recognition in noisy channels

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Department, Program, or Center

Advisor

Advisor/Committee Member

Comments

Recommended Citation

Campus

Search

Browse

Author Corner

RIT Links

Theses

Characterization of speaker recognition in noisy channels

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Department, Program, or Center

Advisor

Advisor/Committee Member

Comments

Recommended Citation

Campus

Share

Search

Browse

Author Corner

RIT Links