Automatic segmentation of speakers in broadcast audio material

Hyoung-Gook Kim; Thomas Sikora

doi:10.1117/12.526080

18 December 2003 Automatic segmentation of speakers in broadcast audio material

Hyoung-Gook Kim, Thomas Sikora

Proceedings Volume 5307, Storage and Retrieval Methods and Applications for Multimedia 2004; (2003) https://doi.org/10.1117/12.526080
Event: Electronic Imaging 2004, 2004, San Jose, California, United States

Abstract

In this paper, dimension-reduced, decorrelated spectral features for general sound recognition are applied to segment conversational speech of both broadcast news audio and panel discussion television programs. Without a priori information about number of speakers, the audio stream is segmented by a hybrid metric-based and model-based segmentation algorithm. For the measure of the performance we compare the segmentation results of the hybrid method versus metric-based segmentation with both the MPEG-7 standardized features and Mel-scale Frequency Cepstrum Coefficients (MFCC). Results show that the MFCC features yield better performance compared to MPEG-7 features. The hybrid approach significantly outperforms direct metric based segmentation.

Citation Download Citation

Hyoung-Gook Kim and Thomas Sikora "Automatic segmentation of speakers in broadcast audio material", Proc. SPIE 5307, Storage and Retrieval Methods and Applications for Multimedia 2004, (18 December 2003); https://doi.org/10.1117/12.526080

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available