JEI Letters

Efficient temporal alignment of video sequences using unbiased bidirectional dynamic time warping

[+] Author Affiliations
Cheng Lu, Mrinal Mandal

University of Alberta, Department of Electrical and Computer Engineering, Edmonton, Alberta, T6G 2V4, Canada

J. Electron. Imaging. 19(4), 040501 (December 21, 2010). doi:10.1117/1.3488415
History: Received March 12, 2010; Revised August 02, 2010; Accepted August 13, 2010; Published December 21, 2010; Online December 21, 2010
Text Size: A A A

Open Access Open Access

We propose an efficient technique for temporally aligning video sequences of similar activities. The proposed technique is able to synchronize view-variance videos from different scenes performing similar 3-D activities. Unlike existing techniques that just consider unidirectional alignment, the proposed technique considers symmetric temporal alignment and computes the optimal alignment by eliminating any view-based bias. The advantages of our technique are validated by experiments conducted on synthetic and real video data. The experimental results show that the proposed technique out-performs existing techniques in several test video sequences.

Figures in this Article

Temporal alignment of video sequences is important in applications such as superresolution imaging,1 robust multiview surveillance,2 and mosaicking. In some applications, it is required to align video sequences from two similar scenes, where analogous motions have different trajectories through the video sequence. Figure 1 illustrates two similar motions occurring in related 3-D planar scenes with respect to time. Camera 1 views 3-D scene

X(X1,Y1,Z1,t1)
in view 1(ν1) and acquires video I1(x1,y1,t1). Camera 2 views another 3-D sceneX(X2,Y2,Z2,t1)in view 2 (ν2) and acquires video I2(x2,y2,t2). Note that the motions in these two scenes are similar but have dynamic time shift. The homography matrix H is typically used to represent the spatial relationship between these two views.

Graphic Jump LocationF1 :

Illustration of two distinct scenes acquired using two distinct cameras.

A typical schematic for temporal alignment is shown in Fig. 2. Note that for the sake of correlating two videos and representing the motions, features are extracted and tracked separately in each video. Robust view-invariance tracker methods are used to generate feature trajectories

F1(x1,y1,t1)
and F2(x2,y2,t2) from video I1 and I2, respectively.

Graphic Jump LocationF2 :

A typical schematic of existing techniques.

Existing techniques vary on how to compute the temporal alignments. Giese and Poggio3 computed the temporal alignment of activities of different people using dynamic time warping (DTW) between the feature trajectories, but limited their technique to a fixed viewpoint. Rao et al. 4 used a rank-constraint-based technique (RCB) in DTW to calculate the synchronization. Such techniques only consider unidirectional alignment,34 i.e., they project the trajectory from one scene to the other, which designates one view as the reference for computing the temporal alignment. Such techniques introduce the bias toward the reference trajectory, i.e., due to the noise and imperfection of the obtained reference trajectory, such a technique will produce erroneous alignment. Therefore, for the sake of minimizing the bias, one should consider computing the alignment in a symmetric way. Singh et al. 5 formulated a symmetric transfer error (STE) as a functional of regularized temporal warp. The technique determines the time warp that has the smallest STE. It then chooses one of the symmetric warps as the final temporal alignment. The STE technique provides better results than unidirectional alignment schemes. The accuracy of the temporal alignment can be improved further, since the STE technique does not really eliminate the reference-view bias between two sequences.

In this work, we propose an unbiased bidirectional dynamic time warping (UBDTW) technique that can remove biasing and provide more accurate results.

The schematic of the proposed temporal alignment technique is shown in Fig. 3. The technique consists of three steps which are explained in the following sections.

Bidirectional Projections

Since feature trajectories represent the activities in the video sequences, we compute the projections of the feature trajectories

F1
from scene 1 to 2 and F2 from scene 2 to 1 using Eq. 1 as follows: Display Formula
F2p(x1,y1,t1)=H12·F1(x1,y1,t1),
Display Formula
1F1p(x2,y2,t2)=H21·F2(x2,y2,t2),
where H12 and H21 are the homographies from scene 1 to 2 and scene 2 to 1, respectively. Homographies are independent of the scene structure and can be computed from X(x1,y1,z1,t1)X(x2,y2,z2,t2) using the direct linear transform (DLT) algorithm.6

Computation of Symmetric Warps

Once we obtain two pairs of feature trajectories,

(F1,F2p)
and (F1p,F2), we compute the symmetric warps W1,2p and W1p,2 using regularized DTW. We construct the warp W as follows: Display Formula
2W=w1,w2,...,wLmax(L1,L2)L<L1+L2,
where L1 and L2 are the length of trajectories F1 and F2, respectively. The L'th element of the warp W is wL=(i,j), where i and j are the time indices of F1 and F2, respectively. The optimal warp is the minimum distance warp, where the distance of a warp is defined as follows: Display Formula
3 dist (W)=k=1L dist [F(ik),Fp(jk)],
where dist [F(ik),Fp(jk)] is the distance between the two values of the given time indices (i, j) in the k'th element of the warp. We propose a regularized distance metric function as follows: Display Formula
4 dist [F(i),Fp(j)]=||F(i)Fp(j)||2+w reg ,
Display Formula
5 reg =||F(i)Fp(j)||2+||2F(i)2Fp(j)||2,
where F and 2F are the first and second derivatives of F. The regularization term can be considered a smoothness penalty, where w is the weight (normally, w = 25).

To find the optimal warp, an accumulated distance matrix is created. The value of the element in the accumulated distance matrix is: Display Formula

6D(i,j)= dist [F(i),Fp(j)]+min(φ),
Display Formula
7φ=[D(i1,j),D(i1,j1),D(i,j1)].
A greedy search technique is employed to find the optimal warp W, such that dist (W) is minimum. We can now obtain the symmetric warps W1,2p and W1p,2 for (F1,F2p) and (F1p,F2), respectively.

Optimal Warp Calculation

Note that we calculated symmetric warps

W1,2p
and W1p,2, and corresponding distance matrixes D1,2p and D1p,2 in the last step. However, the warps still have bias (W1,2p biased toward F1 while W1p,2 is biased toward F2). To minimize the effect of biasing on alignment, we first combine D1,2p and D1p,2 to make a new distance matrix Dc as follows: Display Formula
8Dc=D1,2p+D1p,2.
Once Dc is obtained, global constraint based on W1,2p and W1p,2 is added into this matrix. Denote Wc as the warps under the global constraint, which is restricted by the symmetric warps, as follows: Display Formula
9min(W1,2,W2,1)Wcmax(W1,2,W2,1).
Finally, warp Wc, which satisfies the following equation, is chosen as the final warp. Display Formula
10W copt =argWcmin[ dist (Wc)].
Figure 4 shows an intuitive explanation for the proposed optimal warp calculation. The grid represents the distance matrix Dc. The horizontal and vertical axes represent the index of trajectories F2 and F1, respectively. The two bold lines represent the symmetric warps W1,2p and W1p,2. The gray area represents the search area constrained by W1,2p and W1p,2. The warp W copt inside the global constraint is considered as the final unbiased warp.

Graphic Jump LocationF4 :

An intuitive illustration of the UBDTW.

We evaluated our technique using both synthetic and real videos and compared it with RCB4 and STE techniques.5

Synthetic Data Evaluation

In the synthetic data evaluation, we generate planar trajectories 100 frames long using a pseudorandom number gen- erator. These trajectories are then projected onto two image planes using user-defined camera projection matrices. A 60-frames-long time warp is then applied to a section of one of the trajectory projections. The temporal alignment techniques are then applied to the synthetic trajectories. The test was repeated on 100 different synthetic trajectories and 100 similar trajectories with noise added. The added noise was a normally distributed random variate, with zero mean and variance

σ2=0.1
. The mean absolute error between the warp obtained by different techniques and the ground truth is computed as the evaluation metric. The results are shown in Table 1. The percentage in the parentheses represents theimprovement obtained by an alignment technique with respect to the original error. Figure 5 shows a synchronization result with a synthetic trajectory. The performance of the RCB, STE and the proposed techniques are compared. It is clear that the proposed technique outperformed the other techniques.

Graphic Jump LocationF5 :

(a) Synthetic trajectory; (b) result of temporal alignment for synthetic trajectories using RCB, STE and the proposed technique.

Table Grahic Jump Location
Performance improvement of the proposed technique over existing techniques
Real Data Evaluation

For the real video test, we use two videos (54 frames and 81 frames long, respectively) capturing the activity of lifting a coffee cup by different people. We tracked the coffee cup that can represent the activity in a video to generate feature trajectories. Since ground-truth information is not available, we used visual judgement to assess whether the alignment was correct or not.

Figure 6 shows some representative aligned frames in the 4th, 8th, and 12th elements of the alignment warp computed using the STE and the proposed technique. Note that if the coffee cup is at the same position in two frames, we marked it as “matched,” otherwise, “mismatched.” In the results obtained using the STE technique, only one pair of frames is matched, indicating that such technique can often result in erroneous alignments. The performance of the proposed technique is shown in the last two rows. It is observed that all the alignments are correct.

Graphic Jump LocationF6 :

Comparisons of temporal alignment results on real data using STE technique (shown in the first and second rows) and the proposed technique (shown in the third and fourth rows).

An efficient technique is proposed to synchronize video sequences captured from planar scenes and related by varying temporal offsets. The proposed UBDTW technique is able to remove the biasing and lead to accurate temporal alignment. In the future, we would like to extend this work to more general scenes.

Caspi  Y., and Irani  M., “ Spatio-temporal alignment of sequences. ,” IEEE Trans. Pattern Anal. Mach. Intell. . 24, (11 ), 1409–1424  ((2002)).
Lee  L., , Romano  R., , and Stein  G., “ Monitoring activities from multiple video streams: establishing a common coordinate frame. ,” IEEE Trans. Pattern Anal. Mach. Intell. . 22, (8 ), 758–767  ((2000)).
Giese  M. A., and Poggio  T., “ Morphable models for the analysis and synthesis of complex motion patterns. ,” Int. J. Comput. Vis.. 38, (1 ), 59–73  ((2000)).
Rao  C., , Gritai  A., , Shah  M., , and Mahmood  T. F. S., “ View-invariant alignment and matching of video sequences. ,”  Proc. ICCV03. , pp. 939–945  ((2003)).
Singh  M., , Cheng  I., , Mandal  M., , and Basu  A., “ Optimization of symmetric transfer error for sub-frame video synchronization. ,”  Proc. ECCV03. , 554–567  ((2008)).
Hartley  R. I., and Zisserman  A.,  Multiple View Geometry in Computer Vision. , 2nd ed.,  Cambridge University Press ,  Cambridge, UK  ((2004)).
© 2010 SPIE and IS&T

Citation

Cheng Lu and Mrinal Mandal
"Efficient temporal alignment of video sequences using unbiased bidirectional dynamic time warping", J. Electron. Imaging. 19(4), 040501 (December 21, 2010). ; http://dx.doi.org/10.1117/1.3488415


Figures

Graphic Jump LocationF1 :

Illustration of two distinct scenes acquired using two distinct cameras.

Graphic Jump LocationF2 :

A typical schematic of existing techniques.

Graphic Jump LocationF4 :

An intuitive illustration of the UBDTW.

Graphic Jump LocationF5 :

(a) Synthetic trajectory; (b) result of temporal alignment for synthetic trajectories using RCB, STE and the proposed technique.

Graphic Jump LocationF6 :

Comparisons of temporal alignment results on real data using STE technique (shown in the first and second rows) and the proposed technique (shown in the third and fourth rows).

Tables

Table Grahic Jump Location
Performance improvement of the proposed technique over existing techniques

References

Caspi  Y., and Irani  M., “ Spatio-temporal alignment of sequences. ,” IEEE Trans. Pattern Anal. Mach. Intell. . 24, (11 ), 1409–1424  ((2002)).
Lee  L., , Romano  R., , and Stein  G., “ Monitoring activities from multiple video streams: establishing a common coordinate frame. ,” IEEE Trans. Pattern Anal. Mach. Intell. . 22, (8 ), 758–767  ((2000)).
Giese  M. A., and Poggio  T., “ Morphable models for the analysis and synthesis of complex motion patterns. ,” Int. J. Comput. Vis.. 38, (1 ), 59–73  ((2000)).
Rao  C., , Gritai  A., , Shah  M., , and Mahmood  T. F. S., “ View-invariant alignment and matching of video sequences. ,”  Proc. ICCV03. , pp. 939–945  ((2003)).
Singh  M., , Cheng  I., , Mandal  M., , and Basu  A., “ Optimization of symmetric transfer error for sub-frame video synchronization. ,”  Proc. ECCV03. , 554–567  ((2008)).
Hartley  R. I., and Zisserman  A.,  Multiple View Geometry in Computer Vision. , 2nd ed.,  Cambridge University Press ,  Cambridge, UK  ((2004)).

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Related Journal Articles

Related Book Chapters

Topic Collections

PubMed Articles
Advertisement
  • Don't have an account?
  • Subscribe to the SPIE Digital Library
  • Create a FREE account to sign up for Digital Library content alerts and gain access to institutional subscriptions remotely.
Access This Article
Sign in or Create a personal account to Buy this article ($20 for members, $25 for non-members).
Access This Proceeding
Sign in or Create a personal account to Buy this article ($15 for members, $18 for non-members).
Access This Chapter

Access to SPIE eBooks is limited to subscribing institutions and is not available as part of a personal subscription. Print or electronic versions of individual SPIE books may be purchased via SPIE.org.