When using extracted speech feature coefficients for speech synthesis, quantization is considered a lossy compression scheme. The data being compressed cannot be recovered or reconstructed exactly. However, in a speech recognition system for command and control purposes, a certain amount of quantization can be allowed, with comparable results. In some cases, quantization even serves to "close the gaps" between the coefficients of the incoming speech signal and those of the templates. Since the coefficients are not being used to reconstruct the signal, a very coarse quantization can be used, enabling a very low bit-rate transmission with very good recognition results. To reduce the bandwidth further, a binary coding procedure, such as Huffman or Arithmetic Coding, can be applied to the quantized coefficients. Upon receipt of the transmission, the quantized coefficients are decoded and used to perform speech recognition. The sets of coefficients are compared to the templates for each of the commands in the vocabulary. Speech, however, is dynamic in nature and a dynamic recognition procedure is needed to allow for different vocal inflections and durations. A procedure called Dynamic Time Warping is used to "warp" the time axis of the templates to more closely fit the information coming in. By combining all these techniques, a very accurate, very low bit-rate recognizer has been developed and is discussed in this paper.
Department, Program, or Center
Electrical Engineering (KGCOE)
Baecher, Matthew, "A Very low bit-rate speech recognition system" (2001). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus