Paper
1 January 2001 Fusion of visual and audio features for person identification in real video
Dongge Li, Gang Wei, Ishwar K. Sethi, Nevenka Dimitrova
Author Affiliations +
Proceedings Volume 4315, Storage and Retrieval for Media Databases 2001; (2001) https://doi.org/10.1117/12.410926
Event: Photonics West 2001 - Electronic Imaging, 2001, San Jose, CA, United States
Abstract
In this research, we studied the joint use of visual and audio information for the problem of identifying persons in real video. A person identification system, which is able to identify characters in TV shows by the fusion of audio and visual information, is constructed based on two different fusion strategies. In the first strategy, speaker identification is used to verify the face recognition result. The second strategy consists of using face recognition and tracking to supplement speaker identification results. To evaluate our system's performance, an information database was generated by manually labeling the speaker and the main person's face in every I-frame of a video segment of the TV show 'Seinfeld'. By comparing the output form our system with our information database, we evaluated the performance of each of the analysis channels and their fusion. The results show that while the first fusion strategy is suitable for applications where precision is much more critical than recall. The second fusion strategy, on the other hand, generates the best overall identification performance. It outperforms either of the analysis channels greatly in both precision an recall and is applicable to more general applications, such as, in our case, to identify persons in TV programs.
© (2001) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Dongge Li, Gang Wei, Ishwar K. Sethi, and Nevenka Dimitrova "Fusion of visual and audio features for person identification in real video", Proc. SPIE 4315, Storage and Retrieval for Media Databases 2001, (1 January 2001); https://doi.org/10.1117/12.410926
Lens.org Logo
CITATIONS
Cited by 4 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Facial recognition systems

Visualization

Video

Information visualization

Databases

System identification

Image segmentation

RELATED CONTENT


Back to Top