When combined with acoustical speech information, visual speech information (lip movement) significantly improves
Automatic Speech Recognition (ASR) in acoustically noisy environments. Previous research has demonstrated that
visual modality is a viable tool for identifying speech. However, the visual information has yet to become utilized in
mainstream ASR systems due to the difficulty in accurately tracking lips in real-world conditions. This paper presents
our current progress in tracking face and lips in visually challenging environments. Findings suggest the mean shift
algorithm performs poorly for small regions, in this case the lips, but it achieves near 80% accuracy for facial tracking.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.