|
1.IntroductionTracking reliability evaluation of a tracking algorithm is an important issue because it can guide the design of a good tracker. A variety of algorithms for measuring reliability are presented to improve the robustness of the tracking process.1, 2, 3, 4 Several feature-points-based metrics are proposed in Ref. 1 for analysis of partial and total occlusion in video tracking. Erdem introduced other metrics based on the color and motion differences.2 However, these feature-points and color-based metrics are not fit for evaluating the tracking performance of video infrared target tracking because the extracted feature points and color information of the target region are not reliable in infrared images. The infrared sequences are extremely noisy due to rampant systemic noise or color noise sources incurred by the sensing instrument and the noise from the environment.5 The aim of this letter is to design a proper metric to evaluate the performance quantitatively of infrared target tracking while utilizing the intensity values information discriminatively and avoiding extracting the feature points of the target region with a kernel-based method. 2.Tracker Evaluation MetricA kernel-based target tracking approach, such as mean shift algorithm,6 is a commonly used method in the tracking field. Let be the normalized pixel locations in the target region with center in the current frame. The function ( -bin histogram is used) associates to the pixel at location the index of its bin in the quantized feature space. The kernel density estimation of the feature in the target region is computed as6 where is the Kronecker delta function, is the normalization constant, is the common profile used in corresponding feature domain, and is the kernel bandwidth. Thus we have the target modelWe can obtain the target candidates in the same way, and the target location in the current frame can be obtained by optimizing the similarity function of the target model and target candidates.It is unavoidable that some background parts exist in the located target region when we don’t use a contour-based method in which tracking is achieved by evolving the contour frame to frame.7 To evaluate the tracking performance, we seek discriminative components of the tracking model. The selected components of the tracking model are the components that can best describe the tracked target. A rectangular set of pixels covering the target is chosen to represent the target pixels, and an outer surrounding ring set of pixels is chosen to form the sampled background. Given a certain feature , let and be kernel density estimation values of feature for pixels in the target region and background sample, respectively. The log-likelihood ratio of the feature is given by8 where is a small value (we set it to 0.001) that prevents dividing by zero or taking the log of zero. Based on the log-likelihood ratio, we select the components of the tracking model whenwhere is a threshold determined by our prior knowledge of the target. From Eq. 4, we know that the selected components are the components that can best describe a target. This is because high values of denote a higher kernel density of feature than that of the sampled background, and the pixels of feature in the target region are thus parts of the real target. In order to strengthen the selection process, a background-weighted method of the kernel density estimation of the target region is also used.6 Therefore, a cost function is defined to embody the lost information of the selected discriminative components of the initial target region during the tracking process:where is the number of pixels in the target region that construct the selected components in the initial frame and is the number of pixels in the target region that construct these components in frame . Large values of are an indication of the information decrease of the selected components of the initial target model.For two discrete valued random vectors and with marginal probability mass function and joint probability function , the mutual information between them is defined as Given the kernel density estimations of feature and of feature of the initial and current target region, respectively, the marginal probability mass functions and are given bywhere and are the feature values in the quantized feature space. The joint probability between the two kernel density estimations is calculated aswhere is a conditional probability of while observing . We place a one-dimensional kernel centered on and kernel values are used as . For example, conditional probability with a Gaussian kernel is given bywhere is the standard deviation of the Gaussian kernel. Here, we define kernel mutual information asTherefore, a cost function is defined based on kernel mutual information to evaluate how much information of the initial target region holds in frame and it is given bywhere and are the entropies of the target regions of the initial frame and current frame, respectively, in the quantized feature space, which are given bywhere is the maximum information entropy value of the two compared entropies. Because and are the marginal probability mass functions, is also the maximum of the kernel mutual information. So,A single metric can be obtained to evaluate the tracking performance by combining the information of the discriminative components of the kernel target model in frame and kernel mutual information cost function defined above as follows: where the constants , and are chosen to satisfyIn our work, the constants , and are chosen in the same way as the feature-points-based mutual information metric presented in Ref. 1, that is, , , . This means that when the tracked target is lost , achieves the minimum value 0 while the target is entirely accurate located , achieves the maximum value 1. The kernel-based metric is a measure of the tracking performance of a tracking process. A large value of represents a good tracking performance and reliable tracker output in the current frame.3.Experimental ResultsDifferent tracked regions of a standard mean shift tracker6 of a 400-frame infrared ship sequence (the size of each frame is pixels) and a 100-frame infrared plane sequence (the size of each frame is pixels) are evaluated by the kernel-based metric. The intensity space is taken as a feature space and it is quantized into 64 bins. We implement the tracking algorithm with the metric output in on a Pentium 4 platform and the current implementation of the tracking algorithm with the metric output is capable of tracking at 15 and of the ship sequence and plane sequence, respectively. The kernel-based metric is adopted properly in this situation to evaluate the tracking process after a top-hat transform preprocessing in the target region. Some representative frames from these sequences are shown in Figs. 1 and 2, respectively. The rectangle shown in the infrared image indicates the located target region. The outputs of the metric of different located target regions represent quantitatively the amount of information of the selected target that the tracker can capture in different frames. The variations of the tracking performance denoted by the proposed metric for various image frames in different sequences are also shown in Figs. 3 and 4. The variable parameters , and in Eq. 14 are chosen to satisfy the requirement and their values are kept constant throughout the experiments. From Fig. 4, we find that the variation of the cost function is almost the same as that of the proposed metric and the cost function has a similar curve to them but with reverse variation because it evaluates the lost information of the selected components of the initial target model during the tracking process. In fact, we can treat the cost functions identically by assigning the variable parameters as , and in most cases. Notice that for abrupt appearance changes (for example, the size of the tracked target will abruptly increase when one target across another), the metric will be ineffective because the tracker output is not reliable in this situation. Since such abrupt changes are transient, the metric works effectively again after that. As we know, a robust tracker with a proper model update method is less sensitive to the appearance changes and can track the target even though the tracked target model is largely different than the initial target model. Here, in Eq. 5 and in Eq. 11, which are computed from the target region of the initial frame, are also updated when a model update method is implemented. 4.ConclusionsThis paper has presented a kernel-based metric to evaluate the reliability of the tracking process. The metric is constructed with a kernel method by embodying the information flow of the selected discriminative components of the kernel target model and kernel mutual information of the target regions of the initial frame and current frame. Future research will attempt to design a more suitable kernel target model to complement the kernel-based metric. AcknowledgmentsWe would like to thank the anonymous reviewers for their valuable comments. This work is partially supported by the Aeronautics Science Fund (China) under Grant No. 04F57004. ReferencesE. Loutas,
I. Pitas, and
C. Nikou,
“Entropy-based metrics for the analysis of partial and total occlusion in video object tracking,”
IEE Proc. Vision Image Signal Process., 151
(6), 487
–497
(2004). https://doi.org/10.1049/ip-vis:20040552 1350-245X Google Scholar
C. E. Erdem,
A. M. Tekalp, and
B. Sankur,
“Metrics for performance evaluation of video object segmentation and tracking without ground-truth,”
69
–72
(2001). Google Scholar
C. E. Erdem,
A. M. Tekalp, and
B. Sankur,
“Video object tracking with feedback of performance measures,”
IEEE Trans. Circuits Syst. Video Technol., 13
(4), 310
–324
(2003). https://doi.org/10.1109/TCSVT.2003.811361 1051-8215 Google Scholar
P. Villegas and
X. Marichal,
“Perceptually-weighted evaluation criteria for segmentation masks in video sequences,”
IEEE Trans. Image Process., 13
(8), 1092
–1103
(2004). 1057-7149 Google Scholar
J. Wei and
I. Gertner,
“Discrimination, tracking, and recognition of small and fast moving objects,”
Proc. SPIE, 4726 253
–266
(2002). 0277-786X Google Scholar
D. Comaniciu,
V. Ramesh, and
P. Meer,
“Kernel-based object tracking,”
IEEE Trans. Pattern Anal. Mach. Intell., 25
(5), 564
–577
(2003). https://doi.org/10.1109/TPAMI.2003.1195991 0162-8828 Google Scholar
A. Yilmaz,
X. Li, and
M. Shah,
“Contour-based object tracking with occlusion handling in video acquired using mobile cameras,”
IEEE Trans. Pattern Anal. Mach. Intell., 26
(11), 1531
–1536
(2004). 0162-8828 Google Scholar
T. C. Robert,
Y. X. Liu, and
L. Marius,
“Online selection of discriminative tracking features,”
IEEE Trans. Pattern Anal. Mach. Intell., 27
(10), 1631
–1643
(2005). 0162-8828 Google Scholar
|