JEI Letters

Shot boundary detection without threshold parameters

[+] Author Affiliations
H. Koumaras, G. Xilouris, A. Kourtis

NCSR Demokritos, Institute of Informatics and Telecommunications, Patriarchou Gregoriou Str., 15310 Athens Greece

G. Gardikis

University of the Aegean, Department of Information and Communication Systems Engineering, Karlovassi 83200 Samos, Greece

E. Pallis

Technological Educational Institute of Crete, Department of Applied Informatics and Multimedia, Estauromenos 71500 Heraklion, Greece

J. Electron. Imaging. 15(2), 020503 (May 09, 2006). doi:10.1117/1.2199878
History: Received December 05, 2005; Revised March 09, 2006; Accepted March 13, 2006; Published May 09, 2006
Text Size: A A A

Open Access Open Access

Automatic shot boundary detection is a field, where many techniques and methods have been proposed and have claimed to perform reliably, especially for abrupt scene cut detection. However, all the proposed methods share a common drawback: the necessity of a threshold value, which is used as a reference for detecting scene changes. The determination of the appropriate value or the dynamic reestimation of this threshold parameter remains the most challenging issue for the existing shot boundary detection algorithms. We introduce a novel method for shot boundary detection of discrete cosine transform (DCT)-based and low-bit-rate encoded clips, which exploits the perceptual blockiness effect detection on each frame without using any threshold parameter, therefore minimizing the processing demands required for algorithm implementation.

Figures in this Article

Today, a typical end-user of a multimedia system is usually overwhelmed with video collections, facing the problem of organizing them so that they are easily accessible. Thus, to enable an efficient browsing of these video anthologies, it is necessary to design techniques and methods for indexing and retrieving video data. Therefore, the issue of analyzing and automatically indexing the video content by retrieving highly representative information (e.g., shot boundaries) has been raised in the research community.

Several approaches have been proposed in the literature for automatic shot boundary detection (SBD), which can be basically classified according to the detection algorithm that each method implements.

The first group of the SBD methods exploits the variation of the color intensity histograms between consecutive frames. Based on the hypothesis that all frames that belong to the same scene are characterized by the same color histogram, then detecting a color histogram change is a metric for possible scene cut.1 Another group of methods exploits the classification of frames based on mathematical models, like the analysis of the statistics derived from a specific pixel area along the video sequence.2 Similarly, other methods are based on edge detection and edge comparison between successive frames,3 while some specialized methods for MPEG-coded signals have also been proposed.45

However, all the aforementioned methods use a threshold parameter to distinguish shot boundaries and changes. Thus, a common challenge (stemming from the previously referred methods) prior to the SBD process is the selection of the appropriate threshold for identifying the level of variation, which in turn defines a shot boundary.6 If a global threshold is used for the detection of shot boundaries over the whole video, then successful detection rate may vary up to 20% even for the same video content.7 To improve the efficiency and eliminate this performance variation, some later works propose the use of an adaptive threshold, which can be dynamically determined based on the video content.89 But even these methods require a lot of computational power to successfully estimate the appropriate threshold parameter, making their implementation a challenging issue, especially for real-time applications. Another approach uses supervised classifiers instead of thresholds.10

This paper introduces a novel method for SBD, which enables the quick and easy extraction of the most significant frames from a discrete cosine transform (DCT)-based encoded video, without requiring any threshold calculation. The proposed method makes use of a multimetric pixel-based algorithm, which calculates for each frame the mean pixel value differences across and at both sides of DCT block margins. Then, the normalized results indicate the magnitude of the tiling effect. The proposed method exploits the fact that during an abrupt scene change over an interframe, the motion estimation and compensation algorithms of the encoding process do not perform well, with the immediate outcome the intensification of the blockiness effect, which may be not perceptually observable (due to the low display duration of each frame), but it is measurable.

Multimedia applications that distribute audiovisual content are mainly based on DCT-based digital encoding techniques (e.g., MPEG-1/2/4), which achieve high compression ratios by exploiting the spatial and temporal redundancy in video sequences. Most of the standards are based on motion estimation and compensation, using the block-based DCT. The use of the transform facilitates the exploitation in the compression technique of the various psychovisual redundancies by transforming the sequence to a domain, where different frequency ranges with dissimilar sensitivities at the human visual system (HVS) can be accessed independently.

The DCT operates on an X block of N×N image samples or residual values after prediction and creates Y, which is an N×N block of coefficients. The action of the DCT can be described in terms of a transform matrix A. The forward DCT is given by Y=AXAT, where X is a matrix of samples, Y is a matrix of coefficients, and A is an N×N transform matrix. The elements of A areDisplay Formula

1Aij=Cicos(2j+1)iπ2NwhereCi={(1N)12i=0(2N)12i>0.}
Therefore, the DCT can be written asDisplay Formula
2Yxy=CxCyi=0N1j=0N1Xijcos(2j+1)yπ2Ncos(2i+1)xπ2N.
Afterward, in the encoding chain, quantization of the aforementioned DCT coefficients is performed, which is the main reason for the quality degradation and the appearance of artifacts, like the blockiness effect.

The blockiness effect refers to a block pattern of size 8×8pixels in the compressed sequence, which is the result of the independent quantization of individual blocks of block-based DCT. Due to the DCT, within a block (8×8pixels), the luminance discontinuities between any pair of adjacent pixels are reduced by the encoding and compression process. On the contrary, for all the pairs of adjacent pixels, located across and on both edge sides of the border of adjacent DCT blocks, the luminance discontinuities are increased through the encoding process.

Especially for video services in the framework of the 3G/4G mobile communication systems, where the encoding bit rate is very low, the blockiness effect is the main present artifact. Especially during a scene change, where the motion estimation and compensation efficiency falls, the blockiness effect is intensified, without being usually noticeable by the viewer,11 but it is easily measurable by an image processing tool. Thus, by measuring the variance of the blockiness effect during a video sequence, it is possible to identify where and when scene change takes place.

To measure the intensity of the blockiness effect, the average luminance discontinuities at the boundaries of adjacent blocks are calculated by simply comparing the corresponding luminance pixel values. The larger the difference, the more severe is the blockiness effect. For this purpose, for each frame of the video sequence, the individual offsets of the block pixel pairs that Fig. 1 demonstrates are calculated asDisplay Formula

3offset=pixelipixeli+1.
For clarity, Fig. 2 depicts a graphical representation of the offset that Eq. 3 calculates.

Graphic Jump LocationF1 :

Pixel pairs that the proposed algorithm uses for blockiness estimation.

Graphic Jump LocationF2 :

Graphical representation of the offset between a macroblock/block pixel pair.

The vertical ⟨offset⟩ values of a frame can be defined asDisplay Formula

4offsetV=i=1w88j=1hpixel8ijpixel8i+1jh(w8)8.
Similarly the horizontal ⟨offset⟩ isDisplay Formula
5offsetH=i=1wj=1h88pixeli8jpixeli8j+1(h8)w8.
Thus, the ⟨offset⟩ for all the pixel pairs of a video frame with width w and height h is calculated asDisplay Formula
6offsetframe=offsetV+offsetH2.
Afterward, the averaged offset per frame is normalized within 0.01 and 1, where 1 denotes the highest blockiness value and 0.01 the lowest one:Display Formula
7clip(0.01,1,offsetframe),
where clip(x,y,z) is a function that normalizes z within the range [x,y]. Therefore, by applying Eq. 7 to encoded video sequences, the clipped fluctuation of the averaged offset (i.e., the blockiness effect) per frame can be deduced. Based on this and taking under consideration that during a scene change the blockiness effect instantaneously is strengthened, then Eq. 7 provides a quick and simple metric of scene changes.

Due to the fact that during an abrupt scene change, the values of the ⟨offset⟩ become significantly larger than these of an intrascene ⟨offset⟩, by applying the value normalization with Eq. 7, a clear association is deduced between the clipped ⟨offset⟩ values and the abrupt scene change. More specifically, all the measured clipped ⟨offset⟩ values coming from intrascene frames are relatively low (i.e., <0.1), while the measured clipped ⟨offset⟩ values, resulting from a frame over an abrupt scene change, are equal to 1. The most important is that it is not observed more than a few middle values (i.e., around 0.5), which denote severe camera moving, such as zooming, panning, etc. Thus, the difference between the intrascene and interscene ⟨offset⟩ values is so intense that the requirement for any sophisticated threshold estimation for the shot boundary detection is eliminated.

To evaluate the proposed method, a video sequence of 1500 frames from the motion picture Spider-Man II was used as the test signal. The initial PAL (phase alternation line) MPEG-2 video content was transcoded to CIF (common intermediate format) MPEG-4 at 256kbitss advanced simple profile. On the final coded signal, an implementation12 of the aforementioned blockiness estimation algorithm was applied to perform the shot boundary detection. Fig. 3 depicts the deduced ⟨offset⟩ per frame, which was calculated by this procedure.

Graphic Jump LocationF3 :

Parameter ⟨offset⟩ per frame for the Spider-Man II test signal.

Based on Fig. 3, it is also experimentally proved that all the measured clipped ⟨offset⟩ values that come from intrascene frames are relatively low (i.e., <0.1), while the measured clipped ⟨offset⟩ values, resulting from a frame over an abrupt scene change, are equal to 1.

To eliminate the case of a false frame report due to the blockiness propagation from a successfully detected scene-cut frame to its successive neighboring frames, an interval of frames from the last scene change detection (e.g., 25 frames) is considered, during which no scene change is reported even if it is detected.

The efficiency of the proposed method, with the aforementioned described configuration, was also tested on a set of various heterogeneous CIF MPEG-4 video clips encoded at 256kbitss, containing both media clips with abrupt and gradual scene cuts. The corresponding results are depicted in Table 1, along with the performance, for the same encoding bit rate area, of two other existing threshold-exploited shot boundary detection methods for MPEG video13 (for method 1 see Ref. 14, and method 2 see Ref. 15).

Table Grahic Jump Location
Comparison of the proposed method for abrupt and gradual scene changes.

From Table 1, we can deduce that although the proposed method performs similarly to existing threshold-exploited methods regarding the recall metric, it outperforms the rest of the methods for the precision of the scene detection for both abrupt and gradual scene changes, retaining at the same time significantly lower computational cost, due to the absence of a threshold parameter.

We presented a method for SBD without any threshold parameter. Using only the increment of the blockiness effect during a scene cut, the proposed method successfully detects where a scene cut occurs. The efficiency of the proposed technique was successfully tested on both abrupt and gradual scene changes and compared to other existing shot boundary detection methods.

This work was carried out within the “PYTHAGORAS II” research framework, jointly funded by the European Union and the Hellenic Ministry of Education.

Ueda  H., , Miyatake  T., , and Yoshizawa  S., “ IMPACT: an interactive natural-motion-picture dedicated multimedia authoring system. ,” in  Proc. of CHI. , pp. 343–350 ,  ACM ,  New York  ((1991)).
Zhang  H. J., , Kankanhalli  A., , and Smoliar  S. W., “ Automatic partitioning of full-motion video. ,” Multimedia Syst..  0942-4962 1, (1 ), 10–28  ((1993)).
Zabih  R., , Miller  J., , and Mai  K., “ A feature-based algorithm for detecting and classifying scene breaks. ,” in  Proc. ACM Multimedia. , pp. 189–200 ,  San Francisco, CA  ((1993)).
Sugano  M., , Furuya  M., , Nakajima  Y., , and Yanagihara  H., “ Shot classification and scene segmentation based on MPEG compressed movie analysis. ,” in  IEEE Pacific Rim Conf. on Multimedia (PCM) 2004. , pp. 271–279  ((2004)).
Hoashi  K., , Sugano  M., , Naito  M., , Matsumoto  K., , Sugaya  F., , and Nakajima  Y., “ Shot boundary determination on MPEG compressed domain and story segmentation experiments for TRECVID 2004. ,”  Text Retrieval Conf. Video Retrieval Evaluation (TRECVID).  ((2004)).
Lu  H., and Tan  Y., “ An effective post-refinement method for shot boundary detection. ,” IEEE Trans. Circuits Syst. Video Technol..  1051-8215 15, (11 ), 1407–1421  ((2005)).
O’Toole  C., , Smeaton  A., , Murphy  N., , and Marlow  S., “ Evaluation of automatic shot boundary detection on a large video suite. ,” presented at the 2nd U.K. Conf. Image Retrieval: The Challenge of Image Retrieval, Newcastle, U.K. ((1999)).
Lienhart  R., “ Comparison of automatic shot boundary detection algorithms. ,” Proc. SPIE.  0277-786X 2670, , 170–179  ((1996)).
Dailianas  A., , Allen  R. B., , and England  P., “ Comparison of automatic video segmentation algorithms. ,” Proc. SPIE.  0277-786X 2615, , 2–16  ((1995)).
Qi  Y., , Hauptmann  A., , and Liu  T., “ Supervised classification for video shot segmentation. ,” in  Proc. 2003 Int. Conf. on Multimedia and Expo.. , Vol. 2, pp. 689–692  ((2003)).
Tam  W. J., , Stelmach  L., , Wang  L., , Lauzon  D., , and Gray  P., “ Visual masking at video scene cuts. ,” Proc. SPIE.  0277-786X 2411, , 111–119  ((1995)).
Lauterjung  J.,  Picture Quality Measurement. ,  IBC ,  Amsterdam  (Sep. (1998)).
Gargi  U., , Kasturi  R., , and Strayer  S. H., “ Performance characterization of video-shot-change detection methods. ,” IEEE Trans. Circuits Syst. Video Technol..  1051-8215 10, (1 ), 1–11  ((2000)).
Yeo  B.-L., and Liu  B., “ A unified approach to temporal segmentation of motion JPEG and MPEG compressed video. ,” in  Proc. IEEE 2nd Int. Conf. Multimedia Computing and Systems. , pp. 81–83  ((1995)).
Shen  K., and Delp  E. J., “ A fast algorithm for video parsing using MPEG compressed sequences. ,” in  Proc. IEEE Int. Conf. Image Processing. , pp. 252–255  ((1995)).
© 2006 SPIE and IS&T

Citation

H. Koumaras ; G. Gardikis ; G. Xilouris ; E. Pallis and A. Kourtis
"Shot boundary detection without threshold parameters", J. Electron. Imaging. 15(2), 020503 (May 09, 2006). ; http://dx.doi.org/10.1117/1.2199878


Figures

Graphic Jump LocationF1 :

Pixel pairs that the proposed algorithm uses for blockiness estimation.

Graphic Jump LocationF2 :

Graphical representation of the offset between a macroblock/block pixel pair.

Graphic Jump LocationF3 :

Parameter ⟨offset⟩ per frame for the Spider-Man II test signal.

Tables

Table Grahic Jump Location
Comparison of the proposed method for abrupt and gradual scene changes.

References

Ueda  H., , Miyatake  T., , and Yoshizawa  S., “ IMPACT: an interactive natural-motion-picture dedicated multimedia authoring system. ,” in  Proc. of CHI. , pp. 343–350 ,  ACM ,  New York  ((1991)).
Zhang  H. J., , Kankanhalli  A., , and Smoliar  S. W., “ Automatic partitioning of full-motion video. ,” Multimedia Syst..  0942-4962 1, (1 ), 10–28  ((1993)).
Zabih  R., , Miller  J., , and Mai  K., “ A feature-based algorithm for detecting and classifying scene breaks. ,” in  Proc. ACM Multimedia. , pp. 189–200 ,  San Francisco, CA  ((1993)).
Sugano  M., , Furuya  M., , Nakajima  Y., , and Yanagihara  H., “ Shot classification and scene segmentation based on MPEG compressed movie analysis. ,” in  IEEE Pacific Rim Conf. on Multimedia (PCM) 2004. , pp. 271–279  ((2004)).
Hoashi  K., , Sugano  M., , Naito  M., , Matsumoto  K., , Sugaya  F., , and Nakajima  Y., “ Shot boundary determination on MPEG compressed domain and story segmentation experiments for TRECVID 2004. ,”  Text Retrieval Conf. Video Retrieval Evaluation (TRECVID).  ((2004)).
Lu  H., and Tan  Y., “ An effective post-refinement method for shot boundary detection. ,” IEEE Trans. Circuits Syst. Video Technol..  1051-8215 15, (11 ), 1407–1421  ((2005)).
O’Toole  C., , Smeaton  A., , Murphy  N., , and Marlow  S., “ Evaluation of automatic shot boundary detection on a large video suite. ,” presented at the 2nd U.K. Conf. Image Retrieval: The Challenge of Image Retrieval, Newcastle, U.K. ((1999)).
Lienhart  R., “ Comparison of automatic shot boundary detection algorithms. ,” Proc. SPIE.  0277-786X 2670, , 170–179  ((1996)).
Dailianas  A., , Allen  R. B., , and England  P., “ Comparison of automatic video segmentation algorithms. ,” Proc. SPIE.  0277-786X 2615, , 2–16  ((1995)).
Qi  Y., , Hauptmann  A., , and Liu  T., “ Supervised classification for video shot segmentation. ,” in  Proc. 2003 Int. Conf. on Multimedia and Expo.. , Vol. 2, pp. 689–692  ((2003)).
Tam  W. J., , Stelmach  L., , Wang  L., , Lauzon  D., , and Gray  P., “ Visual masking at video scene cuts. ,” Proc. SPIE.  0277-786X 2411, , 111–119  ((1995)).
Lauterjung  J.,  Picture Quality Measurement. ,  IBC ,  Amsterdam  (Sep. (1998)).
Gargi  U., , Kasturi  R., , and Strayer  S. H., “ Performance characterization of video-shot-change detection methods. ,” IEEE Trans. Circuits Syst. Video Technol..  1051-8215 10, (1 ), 1–11  ((2000)).
Yeo  B.-L., and Liu  B., “ A unified approach to temporal segmentation of motion JPEG and MPEG compressed video. ,” in  Proc. IEEE 2nd Int. Conf. Multimedia Computing and Systems. , pp. 81–83  ((1995)).
Shen  K., and Delp  E. J., “ A fast algorithm for video parsing using MPEG compressed sequences. ,” in  Proc. IEEE Int. Conf. Image Processing. , pp. 252–255  ((1995)).

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Related Book Chapters

Topic Collections

PubMed Articles
Advertisement
  • Don't have an account?
  • Subscribe to the SPIE Digital Library
  • Create a FREE account to sign up for Digital Library content alerts and gain access to institutional subscriptions remotely.
Access This Article
Sign in or Create a personal account to Buy this article ($20 for members, $25 for non-members).
Access This Proceeding
Sign in or Create a personal account to Buy this article ($15 for members, $18 for non-members).
Access This Chapter

Access to SPIE eBooks is limited to subscribing institutions and is not available as part of a personal subscription. Print or electronic versions of individual SPIE books may be purchased via SPIE.org.