Book Reviews

Multimedia Retrieval

J. Electron. Imaging. 17(3), 039901 (July 29, 2008). doi:10.1117/1.2967341
History: Published July 29, 2008
Text Size: A A A

Open Access Open Access

Multimedia retrieval is an active research area which falls in the confluence of machine learning, computer vision, and information retrieval communities. Research interest in the field can be estimated from the growing number of publications in the many multimedia-related conferences and journals over the years. However, a relatively new field compared to peers like computer vision, multimedia retrieval has not been able to carve a stable niche into university curricula. Multimedia research today encompasses research in image, video, and audio domains. The field has significant research participation, is continuously evolving, and entails multidisciplinary research; hence, compiling it all into one book is a daunting task. This book contains 13 chapters that discuss the different aspects of multimedia retrieval and is intended for masters-level students. I believe that it is a good effort and a useful resource for advanced undergraduates and early graduate students to begin their research journey.

Chapter 1 begins with a motivation for research in multimedia retrieval and characterizes multimedia content and metadata. A high level overview of a typical multimedia information retrieval system (MIRS) system is also discussed. Chapter 2 delves deeper into details of metadata classification, formats, and standards. Discussions of resource description framework (RDF) and Moving Picture Experts Group (MPEG) are provided.

Chapter 3 is devoted to pattern recognition, including discussions of methods such as support vector machines (SVMs), boosting, hidden Markov models (HMMs), clustering, and dimension reduction. Machine learning techniques are invaluable to multimedia analysis research and no discussion of multimedia analysis is complete without them. This chapter does a good job describing certain learning methodologies in common use. Chapter 4 discusses text search and retrieval—of particular interest are discussions of vector space model, relevance feedback, and the Google PageRank algorithm. Readers get introduced to research questions, solutions, and challenges in the image processing field in Chapter 5. Feature extraction and object recognition are also discussed in this chapter. Chapter 6 takes the reader back into the learning realm with discussions on generative models such as Gaussian mixture models (GMMs), which have been popular for image modeling.

Chapter 7 presents a shift in focus with an introduction to speech modeling, indexing, recognition, and retrieval. Video is the subject of discussion in Chapters 8 and 9: a generic overview of semantic video indexing is presented in Chapter 8, and Chapter 9 presents a methodology for stroke recognition in video. The case in question is tennis video, and a spatiotemporal approach for recognizing events such as "net playing" and "rally" is presented. With a background on video and audio processing, the reader is now ready to witness a combined audio visual approach for video retrieval as presented in Chapter 10.

User interaction, a very important aspect of multimedia processing and retrieval, is the subject of Chapter 11. The chapter includes an interesting synopsis of interaction types, user input modalities, relevance feedback, and personalization. Protection and privacy issues in the multimedia domain are discussed in Chapter 12. The final chapter, Chapter 13, is dedicated to multimedia evaluation strategies, datasets, forums, and workshops. The popular TREC Video Retrieval Evaluation (TRECVID) benchmark is presented as a special case study.

Overall, the book is well written. Although the chapters have been contributed by different people, editors have done a good job in maintaining a more or less uniform flow across them. Each chapter has a further reading section with pointers to relevant journals and conferences for readers who wish to pursue the field further. Readers can gain some breadth about the problems, approaches, and challenges in the multimedia analysis research domain. The book should not be treated as a detailed treatise or a state-of-the-art presentation on the topic. It is meant mainly for beginners with some background in mathematics and statistics with pointers for further reading. I believe that the book will be beneficial for multimedia-analysis-related university courses.

Dhiraj Joshi graduated with an MSc in mathematics and scientific computing from the Indian Institute of Technology, Kanpur. He completed his PhD in computer science from Penn State University in 2007 and now works as a research scientist in the intelligent systems group at Kodak Research. He has been a research intern at IBM T.J. Watson Research Labs, USA, and the Idiap Research Institute, Switzerland. His research interests include contextual inference-based image understanding, large-scale image retrieval, content analysis in multimedia, statistical learning, and social network modeling.


"Multimedia Retrieval", J. Electron. Imaging. 17(3), 039901 (July 29, 2008). ;




Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Related Book Chapters

Topic Collections

PubMed Articles


  • Don't have an account?
  • Subscribe to the SPIE Digital Library
  • Create a FREE account to sign up for Digital Library content alerts and gain access to institutional subscriptions remotely.
Access This Article
Sign in or Create a personal account to Buy this article ($20 for members, $25 for non-members).
Access This Proceeding
Sign in or Create a personal account to Buy this article ($15 for members, $18 for non-members).
Access This Chapter

Access to SPIE eBooks is limited to subscribing institutions and is not available as part of a personal subscription. Print or electronic versions of individual SPIE books may be purchased via