Kernel-based metric for performance evaluation of video infrared target tracking

Jianguo Ling; Erqi Liu; Haiyan Liang; Jie Yang

doi:10.1117/1.2207810

1 June 2006 Kernel-based metric for performance evaluation of video infrared target tracking

Jianguo Ling, Erqi Liu, Haiyan Liang, Jie Yang

Author Affiliations +

Optical Engineering, Vol. 45, Issue 6, 060505 (June 2006). https://doi.org/10.1117/1.2207810

Abstract

A kernel-based metric measuring tracking reliability that is based on discriminative components of a kernel target model and kernel mutual information is presented. The discriminative components of the kernel target model are selected by computing the log-likelihood ratios of class-conditional sample densities of these components from a target region and background sampled region. The components selection process is embedded in a metric with kernel mutual information of the target regions of the initial frame and current frame in video infrared target tracking for online evaluation of the tracking reliability. Experimental results have shown that the metric can effectively characterize target tracking results as good or bad.

1. Introduction

Tracking reliability evaluation of a tracking algorithm is an important issue because it can guide the design of a good tracker. A variety of algorithms for measuring reliability are presented to improve the robustness of the tracking process.^{1, 2, 3, 4} Several feature-points-based metrics are proposed in Ref. 1 for analysis of partial and total occlusion in video tracking. Erdem introduced other metrics based on the color and motion differences.² However, these feature-points and color-based metrics are not fit for evaluating the tracking performance of video infrared target tracking because the extracted feature points and color information of the target region are not reliable in infrared images. The infrared sequences are extremely noisy due to rampant systemic noise or color noise sources incurred by the sensing instrument and the noise from the environment.⁵ The aim of this letter is to design a proper metric to evaluate the performance quantitatively of infrared target tracking while utilizing the intensity values information discriminatively and avoiding extracting the feature points of the target region with a kernel-based method.

2. Tracker Evaluation Metric

A kernel-based target tracking approach, such as mean shift algorithm,⁶ is a commonly used method in the tracking field. Let ${x_{i}}_{i = 1 \dots n}$ be the normalized pixel locations in the target region with center $c$ in the current frame. The function $b : R^{2} \to {1 \dots m}$ ( $m$ -bin histogram is used) associates to the pixel at location $x_{i}$ the index $b (x_{i})$ of its bin in the quantized feature space. The kernel density estimation of the feature $u = 1 \dots m$ in the target region is computed as⁶

Eq. 1

q_{u} = C \sum_{i = 1}^{n} k ({∥ \frac{x_{i} - c}{h} ∥}^{2}) δ [b (x_{i}) - u],

where

δ

is the Kronecker delta function,

C

is the normalization constant,

k (•)

is the common profile used in corresponding feature domain, and

h

is the kernel bandwidth. Thus we have the target model

Eq. 2

q = {q_{u}}_{u = 1 \dots m}, \sum_{u = 1}^{m} q_{u} = 1 .

We can obtain the target candidates in the same way, and the target location in the current frame can be obtained by optimizing the similarity function of the target model and target candidates.

It is unavoidable that some background parts exist in the located target region when we don’t use a contour-based method in which tracking is achieved by evolving the contour frame to frame.⁷ To evaluate the tracking performance, we seek discriminative components of the tracking model. The selected components of the tracking model are the components that can best describe the tracked target. A rectangular set of pixels covering the target is chosen to represent the target pixels, and an outer surrounding ring set of pixels is chosen to form the sampled background. Given a certain feature $u$ , let $q_{u}$ and $o_{u}$ be kernel density estimation values of feature $u$ for pixels in the target region and background sample, respectively. The log-likelihood ratio of the feature $u$ is given by⁸

Eq. 3

L (u) = \log \frac{\max (q_{u}, ξ)}{\max (o_{u}, ξ)}, u = 1 \dots m,

where

ξ

is a small value (we set it to 0.001) that prevents dividing by zero or taking the log of zero. Based on the log-likelihood ratio, we select the components

q_{u}

of the tracking model when

Eq. 4

L (u) > τ,

where

τ

is a threshold determined by our prior knowledge of the target. From Eq. 4, we know that the selected components are the components that can best describe a target. This is because high values of

L (u)

denote a higher kernel density of feature

u

than that of the sampled background, and the pixels of feature

u

in the target region are thus parts of the real target. In order to strengthen the selection process, a background-weighted method of the kernel density estimation of the target region is also used.⁶ Therefore, a cost function

S_{k}

is defined to embody the lost information of the selected discriminative components of the initial target region during the tracking process:

Eq. 5

S_{k} = \frac{N - N_{k}}{N},

where

N

is the number of pixels in the target region that construct the selected components in the initial frame and

N_{k}

is the number of pixels in the target region that construct these components in frame

k

. Large values of

S_{k}

are an indication of the information decrease of the selected components of the initial target model.

For two discrete valued random vectors $X$ and $Y$ with marginal probability mass function $p (x), p (y)$ and joint probability function $p (x, y)$ , the mutual information between them is defined as

Eq. 6

I (X, Y) = \sum_{x} \sum_{y} p (x, y) \log \frac{p (x, y)}{p (x) p (y)} .

Given the kernel density estimations

q_{u}

of feature

u

and

q_{v}

of feature

v

of the initial and current target region, respectively, the marginal probability mass functions

p (u)

and

p (v)

are given by

Eq. 7

p (u) = q_{u}, p (v) = q_{v},

where

u

and

v

are the feature values in the quantized feature space. The joint probability

p (u, v)

between the two kernel density estimations is calculated as

Eq. 8

p (u, v) = p (u) p (v ∣ u),

where

p (v ∣ u)

is a conditional probability of

v

while observing

u

. We place a one-dimensional kernel centered on

u

and kernel values are used as

p (v ∣ u)

. For example, conditional probability

p (v ∣ u)

with a Gaussian kernel is given by

Eq. 9

p (v ∣ u) = \frac{1}{\sqrt{2 π σ}} \exp [\frac{{(u - v)}^{2}}{2 σ^{2}}],

where

σ

is the standard deviation of the Gaussian kernel. Here, we define kernel mutual information as

Eq. 10

I (U, V) = \sum_{u = 1}^{m} \sum_{v = 1}^{m} p (u, v) \log \frac{p (u, v)}{p (u) p (v)} .

Therefore, a cost function

M_{k}

is defined based on kernel mutual information to evaluate how much information of the initial target region holds in frame

k

and it is given by

Eq. 11

M_{k} = \frac{I (U, V)}{\max (H_{1}, H_{2})},

where

H_{1}

and

H_{2}

are the entropies of the target regions of the initial frame and current frame, respectively, in the quantized feature space, which are given by

Eq. 12

H_{1} = \sum_{u = 1}^{m} p (u) \log u, H_{2} = \sum_{u = 1}^{m} p (v) \log v,

where

\max (H_{1}, H_{2})

is the maximum information entropy value of the two compared entropies. Because

p (u)

and

p (v)

are the marginal probability mass functions,

\max (H_{1}, H_{2})

is also the maximum of the kernel mutual information. So,

Eq. 13

0 \leq M_{k} \leq 1 .

A single metric can be obtained to evaluate the tracking performance by combining the information of the discriminative components of the kernel target model in frame $k$ and kernel mutual information cost function defined above as follows:

Eq. 14

E_{k} = c_{1} (α S_{k} + M_{k} + c_{2}),

where the constants

c_{1}, α

, and

c_{2}

are chosen to satisfy

Eq. 15

0 \leq E_{k} \leq 1 .

In our work, the constants

c_{1}, α

, and

c_{2}

are chosen in the same way as the feature-points-based mutual information metric presented in Ref. 1, that is,

c_{1} = 0.5

,

α = - 1

,

c_{2} = 1

. This means that when the tracked target is lost

(S_{k} = 1, M_{k} = 0)

,

E_{k}

achieves the minimum value 0 while the target is entirely accurate located

(S_{k} = 0, M_{k} = 1)

,

E_{k}

achieves the maximum value 1. The kernel-based metric

E_{k}

is a measure of the tracking performance of a tracking process. A large value of

E_{k}

represents a good tracking performance and reliable tracker output in the current frame.

3. Experimental Results

Different tracked regions of a standard mean shift tracker⁶ of a 400-frame infrared ship sequence (the size of each frame is $128 \times 128$ pixels) and a 100-frame infrared plane sequence (the size of each frame is $160 \times 120$ pixels) are evaluated by the kernel-based metric. The intensity space is taken as a feature space and it is quantized into 64 bins. We implement the tracking algorithm with the metric output in $VC + + 6.0$ on a Pentium 4 platform and the current implementation of the tracking algorithm with the metric output is capable of tracking at 15 and $17 frames ∕ s$ of the ship sequence and plane sequence, respectively. The kernel-based metric is adopted properly in this situation to evaluate the tracking process after a top-hat transform preprocessing in the target region. Some representative frames from these sequences are shown in Figs. 1 and 2, respectively. The rectangle shown in the infrared image indicates the located target region. The outputs of the metric of different located target regions represent quantitatively the amount of information of the selected target that the tracker can capture in different frames. The variations of the tracking performance denoted by the proposed metric for various image frames in different sequences are also shown in Figs. 3 and 4.

Fig. 1

Ship target in the sea-sky background: (a) initial frame; (b) correct location, $E_{k} = 1$ ; (c) only part of the target is located, $E_{k} = 0.843$ ; (d) target missing, $E_{k} = 0$ .

Fig. 2

Plane sequence and its different located target regions: (a) frame 8, $E_{k} = 0.935$ ; (b) frame 18, $E_{k} = 0.362$ ; (c) frame 32, $E_{k} = 0.562$ ; (d) frame 70, $E_{k} = 0.904$ ; (e) frame 81, $E_{k} = 0.634$ ; (f) frame 95, $E_{k} = 0.245$ .

Fig. 3

Values of kernel-based metric against frame number for ship sequence.

Fig. 4

Values of kernel-based metric and cost functions against frame number for plane sequence.

The variable parameters $c_{1}, α$ , and $c_{2}$ in Eq. 14 are chosen to satisfy the requirement $0 \leq E_{k} \leq 1$ and their values are kept constant throughout the experiments. From Fig. 4, we find that the variation of the cost function $M_{k}$ is almost the same as that of the proposed metric and the cost function $S_{k}$ has a similar curve to them but with reverse variation because it evaluates the lost information of the selected components of the initial target model during the tracking process. In fact, we can treat the cost functions identically by assigning the variable parameters as $c_{1} = 0.5, α = - 1$ , and $c_{2} = 1$ in most cases. Notice that for abrupt appearance changes (for example, the size of the tracked target will abruptly increase when one target across another), the metric will be ineffective because the tracker output is not reliable in this situation. Since such abrupt changes are transient, the metric works effectively again after that. As we know, a robust tracker with a proper model update method is less sensitive to the appearance changes and can track the target even though the tracked target model is largely different than the initial target model. Here, $N$ in Eq. 5 and $H_{1}$ in Eq. 11, which are computed from the target region of the initial frame, are also updated when a model update method is implemented.

4. Conclusions

This paper has presented a kernel-based metric to evaluate the reliability of the tracking process. The metric is constructed with a kernel method by embodying the information flow of the selected discriminative components of the kernel target model and kernel mutual information of the target regions of the initial frame and current frame. Future research will attempt to design a more suitable kernel target model to complement the kernel-based metric.

Acknowledgments

We would like to thank the anonymous reviewers for their valuable comments. This work is partially supported by the Aeronautics Science Fund (China) under Grant No. 04F57004.

References

1.

E. Loutas, I. Pitas, and C. Nikou, “Entropy-based metrics for the analysis of partial and total occlusion in video object tracking,” IEE Proc. Vision Image Signal Process., 151 (6), 487 –497 (2004). https://doi.org/10.1049/ip-vis:20040552 1350-245X Google Scholar

2.

C. E. Erdem, A. M. Tekalp, and B. Sankur, “Metrics for performance evaluation of video object segmentation and tracking without ground-truth,” 69 –72 (2001). Google Scholar

3.

C. E. Erdem, A. M. Tekalp, and B. Sankur, “Video object tracking with feedback of performance measures,” IEEE Trans. Circuits Syst. Video Technol., 13 (4), 310 –324 (2003). https://doi.org/10.1109/TCSVT.2003.811361 1051-8215 Google Scholar

4.

P. Villegas and X. Marichal, “Perceptually-weighted evaluation criteria for segmentation masks in video sequences,” IEEE Trans. Image Process., 13 (8), 1092 –1103 (2004). 1057-7149 Google Scholar

5.

J. Wei and I. Gertner, “Discrimination, tracking, and recognition of small and fast moving objects,” Proc. SPIE, 4726 253 –266 (2002). 0277-786X Google Scholar

6.

D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object tracking,” IEEE Trans. Pattern Anal. Mach. Intell., 25 (5), 564 –577 (2003). https://doi.org/10.1109/TPAMI.2003.1195991 0162-8828 Google Scholar

7.

A. Yilmaz, X. Li, and M. Shah, “Contour-based object tracking with occlusion handling in video acquired using mobile cameras,” IEEE Trans. Pattern Anal. Mach. Intell., 26 (11), 1531 –1536 (2004). 0162-8828 Google Scholar

8.

T. C. Robert, Y. X. Liu, and L. Marius, “Online selection of discriminative tracking features,” IEEE Trans. Pattern Anal. Mach. Intell., 27 (10), 1631 –1643 (2005). 0162-8828 Google Scholar

Citation Download Citation

Jianguo Ling, Erqi Liu, Haiyan Liang, and Jie Yang "Kernel-based metric for performance evaluation of video infrared target tracking," Optical Engineering 45(6), 060505 (1 June 2006). https://doi.org/10.1117/1.2207810

Published: 1 June 2006

Access the abstract

JOURNAL ARTICLE
3 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

CITATIONS

Cited by 2 scholarly publications.

Explore citations on Lens.org

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Infrared radiation

Infrared search and track

Reliability

Video

Infrared imaging

Detection and tracking algorithms

Video processing

1.

Introduction

2.

Tracker Evaluation Metric

Eq. 1

Eq. 2

Eq. 3

Eq. 4

Eq. 5

Eq. 6

Eq. 7

Eq. 8

Eq. 9

Eq. 10

Eq. 11

Eq. 12

Eq. 13

Eq. 14

Eq. 15

3.

Experimental Results

Fig. 1

Fig. 2

Fig. 3

Fig. 4

4.

Conclusions

Acknowledgments

References

Show All Keywords

Keywords/Phrases

Search In:

Publication Years