Abstract

When using extracted speech feature coefficients for speech synthesis, quantization is considered a lossy compression scheme. The data being compressed cannot be recovered or reconstructed exactly. However, in a speech recognition system for command and control purposes, a certain amount of quantization can be allowed, with comparable results. In some cases, quantization even serves to "close the gaps" between the coefficients of the incoming speech signal and those of the templates. Since the coefficients are not being used to reconstruct the signal, a very coarse quantization can be used, enabling a very low bit-rate transmission with very good recognition results. To reduce the bandwidth further, a binary coding procedure, such as Huffman or Arithmetic Coding, can be applied to the quantized coefficients. Upon receipt of the transmission, the quantized coefficients are decoded and used to perform speech recognition. The sets of coefficients are compared to the templates for each of the commands in the vocabulary. Speech, however, is dynamic in nature and a dynamic recognition procedure is needed to allow for different vocal inflections and durations. A procedure called Dynamic Time Warping is used to "warp" the time axis of the templates to more closely fit the information coming in. By combining all these techniques, a very accurate, very low bit-rate recognizer has been developed and is discussed in this paper.

Library of Congress Subject Headings

Automatic speech recognition; Speech processing systems; Speech synthesis; Computational linguistics

Publication Date

10-1-2001

Document Type

Thesis

Student Type

- Please Select One -

Department, Program, or Center

Electrical Engineering (KGCOE)

Advisor

Amuso, Vincent

Comments

Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works in December 2013.

Campus

RIT – Main Campus

Plan Codes

EEEE-MS

Share

COinS