Regular Articles

Improved real-time video resizing technique using temporal forward energy

[+] Author Affiliations
Daehyun Park

Kangwon National University, Department of Computer and Communications Engineering, Hyoja 2-dong, Chuncheon Si, Gangwon-Do, Republic of Korea

Kanghee Lee

Kangwon National University, Department of Computer and Communications Engineering, Hyoja 2-dong, Chuncheon Si, Gangwon-Do, Republic of Korea

Yoon Kim

Kangwon National University, Department of Computer and Communications Engineering, Hyoja 2-dong, Chuncheon Si, Gangwon-Do, Republic of Korea

J. Electron. Imaging. 22(1), 013007 (Jan 07, 2013). doi:10.1117/1.JEI.22.1.013007
History: Received March 20, 2012; Revised November 28, 2012; Accepted December 7, 2012
Text Size: A A A

Open Access Open Access

Abstract.  A novel video resizing algorithm that preserves the dominant contents of video frames is proposed. Because a visual correlation (a similarity) exists on consecutive frames within an identical shot, the energy distribution of the neighboring frames is also correlated and similar, and then the seams in a frame are analogous to those of neighboring frames. Thus, the seams of the current frame are derived from a specified range considering the coordinates of the seams of the previous frame. The proposed method determines the two-dimensional connected paths for each frame by considering both the spatial and temporal correlations between frames to prevent jittering and jerkiness. For this purpose, a new temporal forward energy is proposed as the energy cost of a pixel. The proposed algorithm has a fast processing speed similar to the bilinear method, while preserving the main content of an image to the greatest extent possible. In addition, because the memory usage is remarkably small compared with the existing seam carving method, the proposed algorithm is usable in mobile devices, which have many memory restrictions. Computer simulation results indicate that the proposed technique provides a better objective performance, subjective image quality, and content conservation than conventional algorithms.

Figures in this Article

With the rapid development of wireless communication and mobile display devices, mobile multimedia have been available in many commercial areas, and among them, video contents have been considered an important form of information. In particular, since a variety of display devices such as tablet computers, cellular phones, smartphones, and handheld personal computers are released in the market, and such devices have different display resolutions, video needs to be changed according to different resolution sizes or aspect ratio of display panels. That is to say, the spatial resolution of video is downsized or upsized by resizing algorithms in order to use a video contents more effectively. However, because the simple resizing techniques such as scaling and cropping do not take into account the dominant contents (i.e., a person at the picture) in video, the primary transformation or distortion of such salient objects is inevitable. Therefore, it is necessary to develop a new content-based video resizing method that can preserve the dominant contents in an image while changing its size.

Figure 1(b)1(e) are 30% horizontally reduced images of Fig. 1(a). Because the scaling method adjusts the sampling rate uniformly over a whole image, if the scaling ratio is different from the aspect ratio of the source image, the contents of the source image are distorted [Fig. 1(b)]. Therefore, content-based image resizing methods have been studied in order to prevent this visual distortion. Cropping is effective for displaying a region of interest (ROI) where dominant objects are located. Santella et al. proposed a semi-automatic cropping technique,1 which finds the important content and crops an image. However cropping-based methods discard the exterior region of ROI when the resolution of the target display is much smaller than that of original video and cannot correctly preserve the sparse multiple objects [Fig. 1(c)]. To solve this problem, Liu et al.2 proposed the fisheye-view warping technique, that preserves the dominant region while other region is warped [Fig. 1(d)]. Fisheye-view warping preserves the main content of an image as much as possible but has the disadvantage of severely distorting the rest of the video information. Recently, Avidan et al.3 introduced the seam carving technique, which is known to have high image scaling performance with low quality loss of a retargeted image [Fig. 1(e)]. In order to resize images, this method removes or inserts pixels of a seam which is defined as a vertical (or horizontal) connected path of pixels with the minimum gradient energy. In addition, studies on additional methods to preserve the contents and change the size of an image are in progress.415

Graphic Jump LocationF1 :

Comparison of various methods to resize images.

In a video, the geometric transformations16,17 of a content-based image are different from those in static images. For static images, each image is processed only in the spatial dimensions; in contrast, in a video, a consideration of the relationship between adjacent frames is needed because the concept of the time dimension is added. Without the consideration of the relationship, the contents of each frame’s image can be preserved. However, the irregular movement of the contents’ location in a video generates a shaking phenomenon (jitter) for the contents, because the connectivity of the time axis is lost. Therefore, it is essential to protect the time continuity of the contents to prevent this shaking phenomenon, which implies that a new content-based geometric conversion algorithm should be applied to videos.

There have been several classes of video retargeting approaches. Setlur et al.16 generates a motion illustration by using a principal motion direction in video to detect and accentuate a moving object’s motion in a single static frame. Liu et al.17 performs a video retargeting using an automatic pan-and-scan method by moving the cropping window in each frame. Furthermore, additional studies on methodologies that can maintain the dominant contents and change the size of an image or a video are in progress.415

Video carving,18,19 which is the application of seam carving to a video, uses a three-dimensional (3-D) cube to connect the frames to the time axis. Rubinstein et al.18 introduces an improved seam carving algorithm for image and video retargeting, which applies forward energy instead of gradient value to evaluate the energy of a pixels. Chen et al.19 proposed a video carving to handle two-dimensioanl (2-D) connected surface of pixels in 3-D space-time volume by constructed consecutive frames in video. However, since the location and geometric shape of contents are changed in the video frames, the 2-D connected surface considering spatial and temporal connectivity in whole video is not obtained simply. Therefore, in order to attain effective video retargeting, the entire 3-D space-time volume has to be analyzed while considering the energy in spatial and temporal connectivity of 2-D surface (Fig. 2). At this time, because 2-D connected surface is obtained by applying the graph cut technique20,21 that is required to a large amount of memory and high-complexity operations within the both Rubinstein’s and Chen’s methods, novel real-time image retargeting technique is required for a systems with limited resources, such as a mobile devices.

In this paper, a novel video resizing algorithm that preserves the dominant contents of video frames is proposed. The proposed method determines the 2-D connected paths for each frame by considering both the spatial and the temporal correlation between frames to prevent jitter and jerkiness with a reduced computational cost. Therefore, this method is performed in real-time and with low memory consumption.

The proposed technique operates by shot unit, which means that the consecutive images are taken by a single camera, and all of the frames within a shot have similar features. First, in order to separate each shot effectively in a video, a shot change is detected by monitoring the brightness differences and the histogram differences, which are susceptible to movement and color change,22,23 respectively. If a shot change is generated and a new shot begins, the first frame of the shot is resized using the conventional seam carving technique on the static image. At this time, the seams extracted by the seam carving technique and the coordinates of the seams are stored. The proposed image resizing technique can calculate the new seams of the next frame in real-time by the newly proposed forward energy instead of creating a 3D cube which requires information on all of the video frames. And then the image resizing is carried out by the seams.

This paper is organized as follows. In the next section, the conventional seam carving algorithm is briefly introduced. The proposed algorithm is presented in Sec. 3. Section 4 presents and discusses the experimental results. Finally, our conclusions are given in Sec. 5.

The seam carving method extracts the seam of which the change of the energy is the lowest in the image, and controls the image size by adding or removing the pixel to the each coordinate of the seam. Seam is a line which is connected widthwise or lengthwise and composed of one pixel per a row and/or a column. In W×H image, the seam is defined as Eq. (1). Display Formula

sv={[X(j),j]}j=1H,s.t.j,|X(j)X(j1)|1,sh={[i,Y(i)]}i=1W,s.t.i,|Y(i)Y(i1)|1,X:{1,,H}{1,,W},Y:{1,,W}{1,,H},(1)
where sv is the vertical seam, sh is the horizontal seam, X and Y are the mapping functions for the row and column coordinates of the image, respectively. The column seam is the lengthwise connected coordinates set, and similarly the width seam is the widthwise connected coordinates set. In one image, several seams exist, and among them the optimum seam S* required in the seam carving process is defined as Eq. (2). Display Formula
emin=minaS[E(a)],S=E1(emin),(2)
where S is a set of all seams obtained from one image, and E() is the cumulative energy function about one seam. That is, the optimum seam has the minimum energy value among the whole seams in one image. The many operation quantities are required to calculate all seams in the image in order to find the optimum seam. The optimum seam is obtained by applying the dynamic programming technique24,25 in order to reduce these calculation quantities. The method finding the cumulative minimum energy map M, that is the first stage of the dynamic programming, by using the condition of the vertical seam of Eq. (1) and the matrix structure of image in W×H image shows up in Eq. (3). Display Formula
M(i,j)=e(i,j)+min[M(i1,j1),M(i,j1),M(i+1,j1)],0i<W,0j<H,(3)
where e() is the function finding the energy of the corresponding coordinates. The vertical cumulative minimum energy values are stored to the last row of M obtained by Eq. (3), and the vertical seam is found from each cumulative minimum energy values through the reverse search. The number of the vertical seams is identical with the horizontal size of the image since the number of the cumulative minimum energy values are like the horizontal size of the image. The optimum seam among the vertical seams is found through the reverse search from the pixel of which the cumulative minimum energy value is the smallest. The horizontal optimum seam can be found in the same way.

The image size can be controlled by adding or deleting video data on the coordinates of the optimum seam. Several seams are required in order to control the image size variously. After excluding the pixels corresponding to the seam which firstly is extracted in order to extract several seams, the next seam is extracted by the renewal of M. The reason for excluding the pixel corresponding to the previous seam coordinates in order to find the new seam is to satisfy the definition of the seams. The energy of the pixels comprising the optimum seam is low. Therefore, if the pixels of the already selected seam are not removed, the possibility that these pixels are again selected is high, and the overlapped pixels between the seams are generated, so the definition of the seam cannot be satisfied. If the definition of the seam is not satisfied, when converting the image size, the same pixel is repeatedly referred and the distortion of the result image is generated. Because the renewal of M is needed in order to prevent this distortion, the total processing time delay is inevitable. If the resolution of the image to adjust is big, the delay time increases exponentially.

As shown in Fig. 3, the proposed real-time content-aware video resizing system is composed of three parts: shot change detection (SCD), generating seam, and image resizing (1). If shot change is detected and a new shot is initiated, seam information stored of the previous frame is ignored and the new seams are searched using the seam carving technique on the static images. And then, after storing the information about the searched seam, the frame is resized to the target size. On the other hand, if a shot has been continued, the seams of the current frame are calculated by using the seam information stored of the previous frame, and then the frame is resized by generated seam.3.1.Detecting Shot Change.

Graphic Jump LocationF3 :

Overall system block diagram.

Because the frame rate of a video is more than 10 fps, the shot change detection is performed every 10 frames. First, the feature values are extracted between two consecutive frames.Display Formula

fi(n)=j=0height1i=0width1|in(i,j)in1(i,j)|,fh(n)=k=0255|hn(k)hn1(k)|,(4)
where in(i,j) is the (i,j)’th pixel value in the n’th frame, and fi represents the brightness change susceptible to movement. In addition, hn(k) indicates the histogram of gray level k in the n’th frame, and the difference between h(k)s of consecutive frames is defined as fh of the histogram change susceptible to color change.

For the stability of the algorithm, the shot change detection is not performed until 10 feature values are gathered. After 10 feature values are gathered, the largest and the second largest feature values are extracted and the difference between the two values is calculated. The shot change between two consecutive frames is detected through the following equations.Display Formula

SCD={1,ifmi>3siandmh>3sh0otherwise,mi=maxαFi(α),si=maxαFi,αmi(α),mh=maxαFh(α),sh=maxαFh,αmh(α),Fi={fi(n9),fi(n8),,fi(n1),fi(n)},Fh={fh(n9),fh(n8),,fh(n1),fh(n)},(5)
where Fi and Fh are the sets of the feature values calculated on the previous 10 frames, and mi and si is the largest and the second largest value within the set Fi, respectively Also, mh and sh is the largest and the second largest value within the set Fh, respectively. In the case where mi and mh are three times greater than si and sh, respectively, it is determined that the shot change has happened. If a shot change is detected, as mentioned above, the shot change detection process is not performed until 10 new feature values are obtained.

Since the conventional seam carving for a static image is applied to the first frame after a shot change, the frequency of shot change has an effect on the speed of the algorithm. However, in the case of the general video, the scene change does not occur frequently as much as the real time processing is obstructed.

Deriving Seam in the First Frame

After a shot change is generated, the conventional seam carving for a static image is applied to the first frame. All the coordinate and energy values of the seams of the first frame are stored in order to use this information when finding seams in the next frame. The following equations show the stored information of the seams in frame of W×H size. Display Formula

S=[S1,S2,S3,],Sn={C,E},C=[x0,x1,,xH1]or[y0,y1,,yW1],E=[ev0,ev1,,evH1]or[eh0,eh1,,ehW1],(6)
where the set Sn includes the information for one seam, and S is the array of Sn found in frame. The number of seams is determined by the target image size. Sn is comprised of the array C of the seam’s coordinates and the array E of the energy in each coordinate of seam. The set C stores only x coordinates in case of the vertical seam or only y coordinates in case of the horizontal seam. W and H define the width and height of image, respectively. Seams are numbered in the ascending order of their energy values. The corresponding coordinate sets and energy values for each seam are stored systematically in the buffer.

Generating Seam of Current Frame by New Scheme

The seams of the current frame are extracted with reference to the seams information stored in the buffer when a shot change not occurs, that is the current frame belongs to the same shot as the previous frame. Because a visual correlation (a similarity) exist on consecutive frames within an identical shot, the energy distribution of the neighboring frames are also correlated and similar, and then the seams in a frame are analogous to those of neighboring frame. Thus the seams of the current frame are derived from specified range considering the coordinates of the seams of the previous frame because of correlation. At this time, the seams of temporal connection have to be considered. If the seams for each frame in video are generated independently without correlation, the jitter and jerkiness are occur. The visual artifact of jitter mainly occurs, in particular, because of a difference in the numbers of the seams around the dominant contents each frame. For example, assume that in the first frame, three seams and five seams were extracted from the left and right of some contents, respectively. And in the consecutive second frame, five seams and three seams were extracted from the left and right of the same contents of first frame, respectively. If the image size is changed identically for the two frames, the relative locations of the contents between the two frames have a difference of two pixels. This problem is jitter, which occurs on the contents of frame by repeating process of extracting seams independently for each frame. Figure 4 shows the results of independently expanding the size of the consecutive frames by the seam carving.

Graphic Jump LocationF4 :

Independent seam carving result for each frame.

If we give attention to the picture in the red circle each frame in Fig. 4, we can observe that seven seams and one seam exist to the left and right of the red circle in the first frame, respectively, whereas six seams and two seams exist to the left and right of the red circle in the second frame, respectively. In the original video, the picture in the red circle exists in a fixed location. However, in the images expanded independently by the seam carving, the picture in the second frame moves one pixel to the left compared to the first frame. If these processes are repeated, the contents in the red circle shake tremendously.

Therefore, in a video, preventing the shaking phenomenon is more important than finding the optimum seam. This section presents a new process to extract seams that prevents the shaking phenomenon and preserves the form of the dominant content.

Seam-ordering of current frame

Since seams of frame can be overlapped, the conventional seam carving extracts the next seam after removing the previous seam. Figure 5 shows the overlapped coordinate between the first seam and the second seam.

Graphic Jump LocationF5 :

Order of seams and example of coordinate overlap.

In Fig. 5, overlapped coordinates are generated at the location where the first seam and the second seam meet. If the coordinate of the overlapped part is used when the image size is modified by the seams, it will be incorrect by one pixel at the location of the overlap. In conclusion, a distortion of the image occurs. Therefore, a specific order is used for the seams. That is, the seam order of the current frame is identical to that of the previous frame. For example, the information of the 4th seam of the previous frame is stored in order to get the 4th seam of the current frame. Equation (7) indicates that the seam information of the previous frame is referred in order to produce the seam of the current frame. Display Formula

Sref=Si(n1),(7)
where Sref has the same structure as Sn in Eq. (4), and is the reference to produce the new seam. Also, n is the number of the current frame, and i is the number of the current seam.

Energy cost of pixel

The conventional seam carving method considers the energy of each pixel to determine a seam, and there exist the various energy functions. The amount of change of the pixel value, the spatial forward energy, the standard deviation, the edge information,26 gradient vector flow,27 the energy of high tasks (e.g., face detector), etc., can be used as the energy, and the other result image is generated according to each energy function. Among them, the spatial forward energy having the good performance uses the difference between adjacent pixels of a pixel. If the pixel is selected as a seam and removed, the adjacent pixels are smoothly connected. The spatial forward energy is defined as Eq. (8). Display Formula

SFEleft-up(i,j)=|p(i+1,j)p(i1,j)|+|p(i,j1)p(i1,j)|,SFEup(i,j)=|p(i+1,j)p(i1,j)|,SFEright-up(i,j)=|p(i+1,j)p(i1,j)|+|p(i,j1)p(i+1,j)|,(8)
where SFE() is the spatial forward energy according to the position of the pixel to be removed, and p(i,j) is the (i,j)’th pixel value. Equation (8) is used to find the vertical seam, and the horizontal seam is obtained by the same method. In calculating SFE(), one among the left-up, up, and right-up is selected only for the pixels of which the spatial connectivity is maintained.

The spatial forward energy shows the good performance about the static images, but not about the videos because the correlation between frames is not considered. In this paper, the temporal forward energy is proposed as the energy considering the correlation between frames. The temporal forward energy can guarantee the continuity of the seam in the time domain.

Figure 6 shows the three possible vertical seam by temporal forward energy, and p(i,j,n) is the (i,j)’th pixel value in the n’th frame. As shown in Fig. 6, we search for the seam whose removal inserts the minimal amount of energy between two consecutive frames. These are seams that are not necessarily minimal in their energy, but will leave less artifacts in the resulting image, after removal. This coincides with the assumption that two neighboring images have piece-wise smooth intensity at the same position of the pixel, which is a popular assumption in the literature. The temporal forward energy according to the position of the pixel to be removed is defined as Eq. (9). Display Formula

TFEleft-down(i,j)=|p(i1,j,n1)p(i,j,n)|+|p(i+1,j,n1)p(i+1,j,n)|,TFEdown(i,j)=|p(i1,j,n1)p(i1,j,n)|+|p(i+1,j,n1)p(i+1,j,n)|,TFEright-down(i,j)=|p(i1,j,n1)p(i1,j,n)|+|p(i+1,j,n1)p(i,j,n)|.(9)

Graphic Jump LocationF6 :

The three possible vertical seam by temporal forward energy.

In calculating TFE(), one among the left-down, down, and right- down is selected only for the pixels of which the temporal connectivity are maintained.

Generating seam of continuous frames

The coordinates of the pixels which are temporally connected with the reference seam of the previous frame are selected as the starting coordinates of a seam. The p is set of coordinate of seam candidate. The next coordinate pn+1 of pn is obtained with reference to pn and the reference seam Sref. The condition to find pn+1 is given by

  1. pn and pn+1 are spatially connected (spatial connection).
  2. pn+1 and C(Sref) are connected to the time axis (temporal connection).

Equation (10) is the process of finding the candidate pixel (CanPix) satisfying the above condition. Display Formula

CanPix=SPATEM,SPA={x|pn1xpn+1},TEM={x|CSref,C(n+1)1xC(n+1)+1},(10)
where n is x coordinate (horizontal seam) or y coordinate (vertical seam) of pn. SPA and TEM is the pixel set satisfying the spatial connection and the temporal connection, respectively. That is, SPA includes the adjacent pixels to pn, and SPA includes the adjacent pixels to C(Sref). Figure 7 shows an example of the spatial connection condition, temporal connection condition, and the set CanPix satisfying two conditions.

Graphic Jump LocationF7 :

Example of coordinate candidates.

The set Canpix is composed of the pixels satisfying the spatial connection and temporal connection altogether and the pixels becomes the candidate for the seam guaranteeing the continuity in the time domain. The spatial forward energy and the temporal forward energy of the candidate pixels are obtained, and the pixel with the smallest sum of the two energy values is included in the seam as Eq. (11). Display Formula

pn+1=argminαCanPix[SFE(α)+TFE(α)],(11)
where SFE() is the function finding the spatial forward energy, TFE() is the function finding the temporal forward energy. The seam which guarantees the spatial connectivity and the temporal connectivity can be obtained by Eqs. (8), (9), and (11), and therefore, the proposed technique resizes the video without distortion of the primary contents and visual artifacts.

Image Resizing

The image size is modified by the coordinates of all the seams that are finally determined in the current frame. When reducing the image size, as many seams as the difference in size between the original video and target video are removed in the order of the seams, one at a time. On the other hand, when expanding the image size, pixel values are inserted to the coordinates of the seams in the order of the seams. Figure 8 shows examples of the process to control the image size. First, a seam map is generated by the coordinates of seams in seam information stored. The size of the seam map is identical to that of the original image, and the corresponding seam numbers are stored with the coordinates of the seams as shown in Fig. 8(a). The image size is controlled by the produced seam map. When reducing the image size, as shown in Fig. 8(b), the seam map is searched and the pixels with the coordinates of the first seam are removed. After the size of the image is reduced by one seam, in order to update the coordinates by removed seam, the referred seam is removed from the seam map. The image size is reduced by repeating this process for the number of seams.

Graphic Jump LocationF8 :

Examples of image resizing process.

On the other hand, when the image size is enlarged, as shown in Fig. 8(c), empty spaces are inserted at the same coordinates as the coordinates of a seam. In addition, the pixel values generated by an interpolation method are filled in the empty spaces, and the image size is expanded. After the size of the image is expanded by one seam, in order to update the coordinates by inserted seam, the referred seam is inserted in the seam map. The target image is obtained by repeating this process.

In this section, the performance of three image resizing techniques are evaluated, namely, the bilinear method, the technique of applying Avidan’s algorithm3 to a video, and the proposed technique. Extensive experimental testing and comparison were performed on several sequences with different characteristics: “SOCCER,” “COASTGUARD,” and “MOTHER & DAUGHTER” are in CIF format (352×288pixels), and “IN TO TREE” are in 720p format (1280×720pixels). All sequences have 300 frames, and were horizontally enlarged by 30%. First, each method was evaluated on the basis of its runtime and the average memory usage, which are the most important factors in real-time processing. The experiments were performed in the 1.86 GHz dual core with 2 GB memory. In order to enhance the reliability in the measured value, the same process was repeated 10 times, and the averages of the result values were compared.

Tables 1 and 2 show the runtime and the average memory usage of each algorithm, respectively.

Table Grahic Jump Location
Table 1Run-times for different algorithms (s).
Table Grahic Jump Location
Table 2Memory usages for different algorithms (KB).

As the Avidan’s algorithm needs many operations and the large storage space in order to analyze the entire frames in video, it cannot be performed on a system with limited resource such as a mobile terminal. However, the proposed algorithm runs about 25 times faster than the Avidan’s algorithm and achieves the comparable runtime as compared with the bilinear method as shown in Table 1. Since the proposed algorithm can process 12 frames per second in case of CIF, real-time processing is possible for systems with a frame rate of 12 frames per second.

Since the proposed algorithm is designed for mobile terminal, the memory usage is also important. As shown in Table 2, the proposed method requires lower memory about three times than the Avidan’s algorithm. Because the new seam of the current frame is computed with reference to the seam information of the previous frame, the memory usage of the proposed method is similar to that of the bilinear method which is usually performed to resize image on mobile device.

Next, whether the main content was maintained and whether the shaking phenomenon exits or not were compared through each result frame. Figure 9 shows “SOCCER” (174th frame), “COASTGUARD” (62th frame), and “MOTHER & DAUGHTER” (60th frame) from the results of each algorithm.

Graphic Jump LocationF9 :

Subjective quality comparison for the different algorithms: (a) original; (b) bilinear method; (c) Avidan’s method; (d) proposed method.

Compared to the source image in Fig. 9(a), the result of the bilinear technique in Fig. 9(b) indicates that the shapes of the primary contents have been broadened. However, in the images results from Avidan’s algorithm and the proposed algorithm, the shapes of the contents are similar to those in the original image. Thus, it is seen that the proposed algorithm maintains the main content of the image.

Finally, the differences between the experimental results and source image are shown as the Error Rate given by Display Formula

Dn=j=0height1i=0width1|fn(i,j)fn+1(i,j)|,ζ=1height×widthk=1K1Dk,Error Rate=|1ζ0ζ|×100,(12)
where fn indicates R, G, and B values of the n’th frame, and Dn shows the error per pixel between n’th frame and (n+1)’th frame. K is the number of total frames, and ζ0 is the error between frames in the original video. Error Rate represents the difference between original video and the result video. The Table 3 shows numerically how many differences the result images by the proposed method and the Avidan’s method shows with the original video by Error Rate.

Table Grahic Jump Location
Table 3Error rates for different algorithms.

As shown in the Table 3, the result images by the proposed method have the smaller error rate and are more similar to the original video than those of the Avidan’s method.

Figure 10 shows the differences between adjacent frames in “IN TO TREE” (frames 33–36). Because these frames belong to a single shot, any differences between adjacent frames are small.

Graphic Jump LocationF10 :

Differences between adjacent frames in original video.

As shown in Fig. 11(a), because the technique applying Avidan’s algorithm to video does not consider the relation between adjacent frames, the shaking phenomenon occurs and many differences between neighboring frames are generated. On the other hand, because the proposed algorithm considers the correlation between adjacent frames, there is no shaking phenomenon and the differences between neighboring frames are similar to those in original video as shown in Fig. 11(b).

Graphic Jump LocationF11 :

Differences between adjacent frames after applying Avidan’s and proposed algorithm.

The results have been presented only for horizontal direction. In order to control the image size in both directions, the proposed algorithm is just applied twice: once in the horizontal direction and once in the vertical direction.

A novel video resizing algorithm that preserves the dominant contents of video frames was proposed. Because a visual correlation (a similarity) exist on consecutive frames within an identical shot, the energy distribution of the neighboring frames are also correlated and similar, and then the seams in a frame are analogous to those of neighboring frame. Thus, the seams of the current frame are derived from specified range considering the coordinates of the seams of the previous frame because of correlation. The proposed method determines the 2-D connected paths for each frame by considering both the spatial and temporal correlations between frames to prevent jitter and jerkiness. The conventional seam carving requires too much complexity and a large amount of memory because the entire frames in video have to be analyzed. Therefore, the conventional seam carving cannot be performed on a system with mobile terminal. The proposed algorithm has a fast processing speed similar to that of the bilinear method, while preserving the main content of an image to the greatest extent possible. In addition, because the memory usage is remarkably small compared with the existing seam carving method, the proposed algorithm is usable in mobile terminals which have limited memory resources. Computer simulation results indicate that the proposed technique provide better objective performance, subjective image quality, shaking phenomenon removal, and content conservation than conventional algorithms.

Pseudocode of the framework of the proposed algorithm:

F=number of frameN=number of seamfor (f=1; f<=F; f++) { perform shot change detection for (n=1; n<=N; n++)  { if f is first frame or shot change occurred  calculate SFE to pixel of frame without including seam extracted  extract one seam using dynamic programming on SFE  update and accumulate seam information else  calculate SFE and TFE to pixel satisfying spatial and temporal connectivity  for the nth seam of previous frame  generate one seam considering SFE, TFE value and the location of seam of  previous frame  update and accumulate seam information  }create new resizing frame to use seam information}

This study was supported by the Research Grant from Kangwon National University.

Santella  A. et al., “Gaze-based interaction for semi-automatic photo cropping,” in  Proc. SIGCHI Conf. Human Factors Comput. Syst. , pp. 771 –780,  ACM ,  New York  (2006).
Liu  F., Gleicher  M., “Automatic image retargeting with fisheye-view warping,” in  Proc. ACM Symposium on User Interface Software and Technology , pp. 153 –162,  ACM ,  New York  (2005).
Avidan  S., Shamir  A., “Seam carving for content-aware image resizing,” ACM Trans. Graph.. 26, (3 ), 1 –9 (2007). 0730-0301 CrossRef
Battiato  S. et al., “Content-based image resizing on mobile devices,” in  Int. Conf. Comput. Vis. Theory App. (VISAPP) ,  Rome, Italy , pp. 87 –90 (2012).
Tao  C., Jia  J., Sun  H., “Active window oriented dynamic video retargeting,” in  ICCV Proc. Workshop Dynamic. Vis. , pp. 1 –12 (2007).
Cho  S. et al., “Image retargeting using importance diffusion,” in  Proc. IEEE Int. Conf. on Image Process. , pp. 977 –980,  IEEE ,  Cairo  (2009).
Chen  L. et al., “A visual attention model for adapting images on small displays,” Multimedia Syst.. 9, (4 ), 353 –364 (2003). 1432-1882 CrossRef
Liu  H. et al., “Automatic browsing of large pictures on mobile devices,” in  Proc. Eleventh ACM Int. Conf. Multimedia , pp. 148 –155,  ACM ,  New York  (2003).
Suh  B. et al., “Automatic thumbnail cropping and its effectiveness,”  Proc. 16th Ann. ACM Symp. User Interface Software Technology , pp. 95 –104,  ACM Press ,  New York  (2003).
Fan  X. et al., “Looking into video frames on small displays,” in  Proc. Eleventh ACM Int. Conf. Multimedia , pp. 247 –250,  ACM ,  New York  (2003).
Xiao  J. et al., “A novel adaptive interpolation algorithm for image resizing,” Int. J. Innov. Comput. Inform. Cont.. 3, (6(A) ), 1335 –1345 (2007).
Zhang  Y. et al., “Application of a bivariate rational interpolation in image zooming,” Int. J. Innov. Comput. Inform. Cont.. 5, (11(B) ), 4299 –4307 (2009).
Lin  P., Chu  H., Lee  T., “Smooth shape interpolation for 2D polygons,” Int. J. Innov. Comput. Inform. Cont.. 4, (9 ), 2405 –2417 (2008).
Tian  Y. et al., “An iterative hybrid method for image interpolation,” in  Proc. Int. Conf. on Intelligent Computing (ICIC) , Vol. 1, pp. 10 –19,  Springer ,  Berlin, Heidelberg  (2005).
Xiao  J. et al., “Adaptive interpolation algorithm for real-time image resizing,” in  Proc. Innov. Comput. Inform. and Control (ICICIC) , Vol. 2, pp. 221 –224 (2006).
Setlur  V. et al., “Retargeting images and video for preserving information saliency,” IEEE Comput. Graphics Appl.. 27, (5 ), 80 –88 (2007). 0272-1716 CrossRef
Liu  F., Gleichar  M., “Video retargeting: automating pan and scan,” in  Proc. ACM Int. Conf. Multimedia , pp. 241 –250,  ACM ,  New York  (2006).
Rubinstein  M., Shamir  A., Avidan  S., “Improved seam carving for video retargeting,” ACM Trans. Graph.. 27, (3 ), 1 –9 (2008). 0730-0301 CrossRef
Chen  B., Sem  P., “Video carving,” in  Short Papers Proc. Eurographics  (2008).
Kohli  P., Torr  P. H. S., “Dynamic graph cuts for efficient inference in Markov random fields,” IEEE Trans. Pattern Anal. Mach. Intel.. 29, (12 ), 2079 –2088 (2007). 0162-8828 CrossRef
Kwatra  V. et al., “Graphcut textures: image and video synthesis using graph cuts,” ACM Trans. Graph.. 22, (3 ), 277 –286 (2003). 0730-0301 CrossRef
Gargi  U., Kasturi  R., Strayer  S. H., “Performance characterization of video-shot-change detection methods,” IEEE Trans. Circuits Sys. Video Technol.. 10, (1 ), 1 –13 (2000). 1051-8215 CrossRef
Gong  Y., “An accurate and robust method for detecting video shot boundaries,” in  Proc. IEEE Int. Conf. on Multimedia Comput. and Sys. , Vol. 1, pp. 850 –854,  IEEE ,  Florence  (1999).
Bellman  R., “Some problems in the theory of dynamic programming,” Econometrica. 22, (1 ), 37 –48 (1954). 0012-9682 CrossRef
Bertsekas  D. P., Dynamic Programming and Optimal Control. , Vol. II, 3rd ed.,  Athena Scientific  (2007).
Choi  K. S., Ko  S. J., “Fast content-aware image resizing scheme in the compressed domain,” ACM Trans. Consum. Electron.. 55, (3 ), 1514 –1521 (2009). 0098-3063 CrossRef
Battiato  S. et al., “Content-aware image resizing with seam selection based on gradient vector flow,” in  Proc. Int. Conf. on Image Processing (ICIP)  (2012).

Grahic Jump LocationImage not available.

Daehyun Park received BS and MS degrees in computer engineering with the Department of Computer and Communications Engineering from Kangwon National University in 2007 and 2009, respectively. He is now a PhD candidate in computer engineering with the Department of Computer and Communications Engineering at Kangwon National University. His research interests are in the areas of video signal processing and multimedia communications.

Grahic Jump LocationImage not available.

Kanghee Lee received BS and MS degrees in computer engineering with the Department of Computer and Communications Engineering from Kangwon National University in 2009 and 2011, respectively. His research interests are in the areas of video signal processing and multimedia communications.

Grahic Jump LocationImage not available.

Yoon Kim received a BS degree in 1993, an MS degree in 1995, and a PhD degree in 2003, in electronic engineering with the Department of Electronic Engineering from Korea University. In 2004, he joined the Department of Computer and Communications Engineering, Kangwon National University, where he is currently an associate professor. From 1995 to 1999, he was with the LG-Philips LCD Co., where he was involved in research and development on digital image equipment. His research interests are in the areas of video signal processing, multimedia communications, and wireless sensor networks.

© The Authors. Published by SPIE under a Creative Commons Attribution 3.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.

Citation

Daehyun Park ; Kanghee Lee and Yoon Kim
"Improved real-time video resizing technique using temporal forward energy", J. Electron. Imaging. 22(1), 013007 (Jan 07, 2013). ; http://dx.doi.org/10.1117/1.JEI.22.1.013007


Figures

Graphic Jump LocationF1 :

Comparison of various methods to resize images.

Graphic Jump LocationF3 :

Overall system block diagram.

Graphic Jump LocationF4 :

Independent seam carving result for each frame.

Graphic Jump LocationF5 :

Order of seams and example of coordinate overlap.

Graphic Jump LocationF6 :

The three possible vertical seam by temporal forward energy.

Graphic Jump LocationF7 :

Example of coordinate candidates.

Graphic Jump LocationF8 :

Examples of image resizing process.

Graphic Jump LocationF9 :

Subjective quality comparison for the different algorithms: (a) original; (b) bilinear method; (c) Avidan’s method; (d) proposed method.

Graphic Jump LocationF10 :

Differences between adjacent frames in original video.

Graphic Jump LocationF11 :

Differences between adjacent frames after applying Avidan’s and proposed algorithm.

Tables

Table Grahic Jump Location
Table 1Run-times for different algorithms (s).
Table Grahic Jump Location
Table 2Memory usages for different algorithms (KB).
Table Grahic Jump Location
Table 3Error rates for different algorithms.

References

Santella  A. et al., “Gaze-based interaction for semi-automatic photo cropping,” in  Proc. SIGCHI Conf. Human Factors Comput. Syst. , pp. 771 –780,  ACM ,  New York  (2006).
Liu  F., Gleicher  M., “Automatic image retargeting with fisheye-view warping,” in  Proc. ACM Symposium on User Interface Software and Technology , pp. 153 –162,  ACM ,  New York  (2005).
Avidan  S., Shamir  A., “Seam carving for content-aware image resizing,” ACM Trans. Graph.. 26, (3 ), 1 –9 (2007). 0730-0301 CrossRef
Battiato  S. et al., “Content-based image resizing on mobile devices,” in  Int. Conf. Comput. Vis. Theory App. (VISAPP) ,  Rome, Italy , pp. 87 –90 (2012).
Tao  C., Jia  J., Sun  H., “Active window oriented dynamic video retargeting,” in  ICCV Proc. Workshop Dynamic. Vis. , pp. 1 –12 (2007).
Cho  S. et al., “Image retargeting using importance diffusion,” in  Proc. IEEE Int. Conf. on Image Process. , pp. 977 –980,  IEEE ,  Cairo  (2009).
Chen  L. et al., “A visual attention model for adapting images on small displays,” Multimedia Syst.. 9, (4 ), 353 –364 (2003). 1432-1882 CrossRef
Liu  H. et al., “Automatic browsing of large pictures on mobile devices,” in  Proc. Eleventh ACM Int. Conf. Multimedia , pp. 148 –155,  ACM ,  New York  (2003).
Suh  B. et al., “Automatic thumbnail cropping and its effectiveness,”  Proc. 16th Ann. ACM Symp. User Interface Software Technology , pp. 95 –104,  ACM Press ,  New York  (2003).
Fan  X. et al., “Looking into video frames on small displays,” in  Proc. Eleventh ACM Int. Conf. Multimedia , pp. 247 –250,  ACM ,  New York  (2003).
Xiao  J. et al., “A novel adaptive interpolation algorithm for image resizing,” Int. J. Innov. Comput. Inform. Cont.. 3, (6(A) ), 1335 –1345 (2007).
Zhang  Y. et al., “Application of a bivariate rational interpolation in image zooming,” Int. J. Innov. Comput. Inform. Cont.. 5, (11(B) ), 4299 –4307 (2009).
Lin  P., Chu  H., Lee  T., “Smooth shape interpolation for 2D polygons,” Int. J. Innov. Comput. Inform. Cont.. 4, (9 ), 2405 –2417 (2008).
Tian  Y. et al., “An iterative hybrid method for image interpolation,” in  Proc. Int. Conf. on Intelligent Computing (ICIC) , Vol. 1, pp. 10 –19,  Springer ,  Berlin, Heidelberg  (2005).
Xiao  J. et al., “Adaptive interpolation algorithm for real-time image resizing,” in  Proc. Innov. Comput. Inform. and Control (ICICIC) , Vol. 2, pp. 221 –224 (2006).
Setlur  V. et al., “Retargeting images and video for preserving information saliency,” IEEE Comput. Graphics Appl.. 27, (5 ), 80 –88 (2007). 0272-1716 CrossRef
Liu  F., Gleichar  M., “Video retargeting: automating pan and scan,” in  Proc. ACM Int. Conf. Multimedia , pp. 241 –250,  ACM ,  New York  (2006).
Rubinstein  M., Shamir  A., Avidan  S., “Improved seam carving for video retargeting,” ACM Trans. Graph.. 27, (3 ), 1 –9 (2008). 0730-0301 CrossRef
Chen  B., Sem  P., “Video carving,” in  Short Papers Proc. Eurographics  (2008).
Kohli  P., Torr  P. H. S., “Dynamic graph cuts for efficient inference in Markov random fields,” IEEE Trans. Pattern Anal. Mach. Intel.. 29, (12 ), 2079 –2088 (2007). 0162-8828 CrossRef
Kwatra  V. et al., “Graphcut textures: image and video synthesis using graph cuts,” ACM Trans. Graph.. 22, (3 ), 277 –286 (2003). 0730-0301 CrossRef
Gargi  U., Kasturi  R., Strayer  S. H., “Performance characterization of video-shot-change detection methods,” IEEE Trans. Circuits Sys. Video Technol.. 10, (1 ), 1 –13 (2000). 1051-8215 CrossRef
Gong  Y., “An accurate and robust method for detecting video shot boundaries,” in  Proc. IEEE Int. Conf. on Multimedia Comput. and Sys. , Vol. 1, pp. 850 –854,  IEEE ,  Florence  (1999).
Bellman  R., “Some problems in the theory of dynamic programming,” Econometrica. 22, (1 ), 37 –48 (1954). 0012-9682 CrossRef
Bertsekas  D. P., Dynamic Programming and Optimal Control. , Vol. II, 3rd ed.,  Athena Scientific  (2007).
Choi  K. S., Ko  S. J., “Fast content-aware image resizing scheme in the compressed domain,” ACM Trans. Consum. Electron.. 55, (3 ), 1514 –1521 (2009). 0098-3063 CrossRef
Battiato  S. et al., “Content-aware image resizing with seam selection based on gradient vector flow,” in  Proc. Int. Conf. on Image Processing (ICIP)  (2012).

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Related Book Chapters

Topic Collections

PubMed Articles
Advertisement
  • Don't have an account?
  • Subscribe to the SPIE Digital Library
  • Create a FREE account to sign up for Digital Library content alerts and gain access to institutional subscriptions remotely.
Access This Article
Sign in or Create a personal account to Buy this article ($20 for members, $25 for non-members).
Access This Proceeding
Sign in or Create a personal account to Buy this article ($15 for members, $18 for non-members).
Access This Chapter

Access to SPIE eBooks is limited to subscribing institutions and is not available as part of a personal subscription. Print or electronic versions of individual SPIE books may be purchased via SPIE.org.