Articles

Visual speech recognition by recurrent neural networks

[+] Author Affiliations
Gihad Rabi, Si Wei Lu

Memorial University of Newfoundland, Department of Computer Science, St. John’s, Newfoundland A1B 3X5, Canada

J. Electron. Imaging. 7(1), 61-69 (Jan 01, 1998). doi:10.1117/1.482627
History: Received Feb. 15, 1997; Revised Oct. 12, 1997; Accepted July 20, 1997
Text Size: A A A

Abstract

One of the major drawbacks of current acoustically based speech recognizers is that their performance deteriorates drastically with noise. Our focus is to develop a computer system that performs speech recognition based on visual information concerning the speaker. The system automatically extracts visual speech features through image-processing techniques that operate on facial images taken in a normally illuminated environment. To cope with the dynamic nature of change in speech patterns with respect to time as well as the spatial variations in the individual patterns, the proposed recognition scheme uses a recurrent neural network architecture. By specifying a certain behavior when the network is presented with exemplar sequences, the recurrent network is trained with no more than feedforward complexity. The network’s desired behavior is based on characterizing a given word by well-defined segments. Adaptive segmentation is employed to segment the training sequences of a given class. This technique iterates the execution of two steps. First, the sequences are segmented individually. Then, a generalized version of dynamic time warping is used to align the segments of all sequences. At each iteration, the weights of the distance functions used in the two steps are updated in a way that minimizes a segmentation error. The system is implemented and tested on a few words. The results are satisfactory. In particular, the system is able to distinguish between words with common segments. Moreover, it tolerates to a great extent variable-duration words of the same class. © 1998 SPIE and IS&T.

© 1998 SPIE and IS&T

Citation

Gihad Rabi and Si Wei Lu
"Visual speech recognition by recurrent neural networks", J. Electron. Imaging. 7(1), 61-69 (Jan 01, 1998). ; http://dx.doi.org/10.1117/1.482627


Figures

Tables

References

Access This Article
Sign in or Create a personal account to Buy this article ($20 for members, $25 for non-members).

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Related Book Chapters

Topic Collections

PubMed Articles
Advertisement
  • Don't have an account?
  • Subscribe to the SPIE Digital Library
  • Create a FREE account to sign up for Digital Library content alerts and gain access to institutional subscriptions remotely.
Access This Article
Sign in or Create a personal account to Buy this article ($20 for members, $25 for non-members).
Access This Proceeding
Sign in or Create a personal account to Buy this article ($15 for members, $18 for non-members).
Access This Chapter

Access to SPIE eBooks is limited to subscribing institutions and is not available as part of a personal subscription. Print or electronic versions of individual SPIE books may be purchased via SPIE.org.