Paper
4 April 1997 Speech recognition by humans and machines under conditions with severe channel variability and noise
Richard P. Lippmann, Beth A. Carlson
Author Affiliations +
Abstract
Despite dramatic recent advances in speech recognition technology, speech recognition still perform much worse than humans. The difference in performance between humans and machines is most dramatic when variable amounts and types of filtering and noise are present during testing. For example, humans readily understand speech that is low-pass filtered below 3 kHz or high-pass filtered above 1 kHz. Machines trained with wide-band speech, however, degrade dramatically under these conditions. An approach to compensate for variable unknown sharp filtering and noise is presented which uses mel-filter-bank magnitudes as input features, estimates the signal-to-noise ratio (SNR) for each filter, and uses missing feature theory to dynamically modify the probability computations performed using Gaussian Mixture or Radial Basis Function neural network classifiers embedded within Hidden Markov Model recognizers. The approach was successfully demonstrated using a talker-independent digit recognition task. It was found that recognition accuracy across many conditions rises from below 50% to above 95% with this approach. These promising results suggest future work to dynamically estimate SNR's and to explore the dynamics of human adaptation to channel and noise variability.
© (1997) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Richard P. Lippmann and Beth A. Carlson "Speech recognition by humans and machines under conditions with severe channel variability and noise", Proc. SPIE 3077, Applications and Science of Artificial Neural Networks III, (4 April 1997); https://doi.org/10.1117/12.271525
Lens.org Logo
CITATIONS
Cited by 2 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Speech recognition

Signal to noise ratio

Electronic filtering

Gaussian filters

Linear filtering

Interference (communication)

Neural networks

Back to Top