Parkinson's disease (PD) is one of the neurological disorders that affect the central nervous system leads to cognitive, emotional and speech disorders. Many methods have been proposed over time for discriminating between people with PD and healthy people using signals processing. In this paper, a new approach is defined using i-vector subspace modelling to discriminate healthy people from people with PD. The i-vectors features is one of the crucial parameters that prove promising results in the domain of speech recognition. In this study two i-vectors dimensionality (100 and 200 dimensions) extracted from voice recordings using Gaussian Mixture Models based on Universal Background Model (GMM-UBM) size (64, 128 and 256 Gaussians). To the end, we assess the effect of the i-vectors features by using Support Vector Machine (SVM). The results reveal show that the proposed approach can be strongly recommended for classifying Parkinson's patient from healthy individuals.
Shot Boundary Detection (SBD) also known as a temporal video segmentation is a preprocessing task for multiple videos applications, such as indexing and retrieval. The SBD output provides coherent temporal units which are easy to manipulate. The Most previous works implement theirs frameworks based on visual features to measure similarity for transition detection task. However, the video is very enriched by data which could be beneficial. In this paper, referring to recent multimodal works, we propose to introduce the audio components to increase the SBD task. Firstly, we worked on candidate segments obtained by measuring similarity between low features (SURF, HSF) from original video. Then we used deep features obtained from trained model (Resnet-50) for visual similarity and we introduced the audio segmentation based on Power Spectrum Density (PSD) to contribute for transition detection. The proposed method is evaluated on the clip shots dataset. Experiments on this data show that the proposed multimodal approach can achieve a better performance compared with the state-of-the-art of methods that used visual approach.
Through the ages, in all nations, at all times, people spend a lot of their time on discussing new and important issues either on meetings or in conferences. With the evolution and the abundance of Automatic Speech Recognition (ASR) frameworks, automatic transcripts and even automatic meeting summarization are getting more and more interesting. Recently, automatic summarization faces deeper progresses on speech summarization. Neural models had been introduced to tackle with many difficulties of abstractive summarization. Our contribution in this paper focuses on these weaknesses of neural abstractive meeting summarization and suggests an encoder-decoder model based on an attentional algorithm on the decoding sequence. We proposed a deep encoder-decoder model based on attention mechanism (DEDA) for ASR transcripts. Experiments on the AMI Dataset demonstrates that our proposed method ensured competitive results with the state of the art even on extractive or abstractive models. The experimental analyses also put the stress on the performance of the summarized utterances as well as the reduction of the occurrence repetition in summaries.
In this paper, Support Vector Machines (SVMs) are used for segmenting and indexing video genres based on only audio features extracted at block level, which has a prominent asset by capturing local temporal information. The main contribution of our study is to show the wide effect on the classification accuracies while using an hierarchical categorization structure based on Mel Frequency Cepstral Coefficients (MFCC) audio descriptor. In fact, the classification consists in three common video genres: sports videos, music clips and news scenes. The sub-classification may divide each genre into several multi-speaker and multi-dialect sub-genres. The validation of this approach was carried out on over 360 minutes of video span yielding a classification accuracy of over 99%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.