This thesis investigates a dynamic programming approach to word hypothesis in the context of a speaker independent, large vocabulary, continuous speech recognition system. Using a method known as Dynamic Time Warping, an undifferentiated phonetic string (one without word boundaries) is parsed to produce all possible words contained in a domain specific lexicon. Dynamic Time Warping is a common method of sequence comparison used in matching the acoustic feature vectors representing an unknown input utterance and some reference utterance. The cumulative least cost path, when compared with some threshold can be used as a decision criterion for recognition. This thesis attempts to extend the DTW technique using strings of phonetic symbols, instead. Three variables that were found to affect the parsing process include: (1) minimum distance threshold, (2) the number of word candidates accepted at any given phonetic index, and (3) the lexical search space used for reference pattern comparisons. The performance of this parser as a function of these variables is discussed. Also discussed is the performance of the parser at a variety of input error conditions.
Library of Congress Subject Headings
Automatic speech recognition; Phonetics, Acoustic--Analysis--Data processing
Department, Program, or Center
Computer Science (GCCIS)
Sellman, R. Thomas, "Word hypothesis from undifferentiated, errorful phonetic strings" (1993). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus