Paper
26 November 2003 Investigation on effectiveness of mid-level feature representation for semantic boundary detection in news video
Regunathan Radhakrishan, Ziyou Xiong, Ajay Divakaran, Bhiksha Raj
Author Affiliations +
Abstract
In our past work, we have attempted to use a mid-level feature namely the state population histogram obtained from the Hidden Markov Model (HMM) of a general sound class, for speaker change detection so as to extract semantic boundaries in broadcast news. In this paper, we compare the performance of our previous approach with another approach based on video shot detection and speaker change detection using the Bayesian Information Criterion (BIC). Our experiments show that the latter approach performs significantly better than the former. This motivated us to examine the mid-level feature closely. We found that the component population histogram enabled discovery of broad phonetic categories such as vowels, nasals, fricatives etc, regardless of the number of distinct speakers in the test utterance. In order for it to be useful for speaker change detection, the individual components should model the phonetic sounds of each speaker separately. From our experiments, we conclude that state/component population histograms can only be useful for further clustering or semantic class discovery if the features are chosen carefully so that the individual states represent the semantic categories of interest.
© (2003) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Regunathan Radhakrishan, Ziyou Xiong, Ajay Divakaran, and Bhiksha Raj "Investigation on effectiveness of mid-level feature representation for semantic boundary detection in news video", Proc. SPIE 5242, Internet Multimedia Management Systems IV, (26 November 2003); https://doi.org/10.1117/12.514397
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Semantic video

Feature extraction

Optical tracking

Signal detection

Digital filtering

Machine learning

RELATED CONTENT


Back to Top