Regular Articles

Improved visual background extractor using an adaptive distance threshold

[+] Author Affiliations
Guang Han

Northeastern University, College of Information Science and Engineering, No. 3-11 Wenhua Road, Heping District, Shenyang 110819, China

Northeastern University at Qinhuangdao, School of Computer and Communication Engineering, No. 143 Taishan Road, Economic and Technological Development Zone, Qinhuangdao, Hebei 066004, China

Jinkuan Wang

Northeastern University at Qinhuangdao, School of Computer and Communication Engineering, No. 143 Taishan Road, Economic and Technological Development Zone, Qinhuangdao, Hebei 066004, China

Xi Cai

Northeastern University at Qinhuangdao, School of Computer and Communication Engineering, No. 143 Taishan Road, Economic and Technological Development Zone, Qinhuangdao, Hebei 066004, China

J. Electron. Imaging. 23(6), 063005 (Nov 06, 2014). doi:10.1117/1.JEI.23.6.063005
History: Received June 4, 2014; Revised September 1, 2014; Accepted October 8, 2014
Text Size: A A A

Open Access Open Access

Abstract.  Camouflage is a challenging issue in moving object detection. Even the recent and advanced background subtraction technique, visual background extractor (ViBe), cannot effectively deal with it. To better handle camouflage according to the perception characteristics of the human visual system (HVS) in terms of minimum change of intensity under a certain background illumination, we propose an improved ViBe method using an adaptive distance threshold, named IViBe for short. Different from the original ViBe using a fixed distance threshold for background matching, our approach adaptively sets a distance threshold for each background sample based on its intensity. Through analyzing the performance of the HVS in discriminating intensity changes, we determine a reasonable ratio between the intensity of a background sample and its corresponding distance threshold. We also analyze the impacts of our adaptive threshold together with an update mechanism on detection results. Experimental results demonstrate that our method outperforms ViBe even when the foreground and background share similar intensities. Furthermore, in a scenario where foreground objects are motionless for several frames, our IViBe not only reduces the initial false negatives, but also suppresses the diffusion of misclassification caused by those false negatives serving as erroneous background seeds, and hence shows an improved performance compared to ViBe.

Figures in this Article

In computer vision applications, objects of interest are often the moving foreground objects in a video sequence. Therefore, moving object detection which extracts foreground objects from the background has become a hot issue,17 and has been widely applied to areas such as smart video surveillance, intelligent transportation, and human-computer interaction.

Visual background extractor8 (ViBe) is one of the most recent and advanced techniques. In comparative evaluation,9 ViBe produces satisfactory detection results and has been proved effective in many scenarios. For each pixel, the background model of ViBe stores a set of background samples taken in the past at the same location or in the neighborhood. Then, ViBe compares the current pixel intensity to this set of background samples using a distance threshold. Only if the new observation matches with a predefined number of background samples is this pixel classified as background, otherwise this pixel belongs to the foreground. However, ViBe uses a fixed distance threshold in the matching process; hence, it has difficulties in handling camouflaged foreground objects (intentionally or not, some objects may poorly differ from the appearance of the background, making correct classification difficult9). Moreover, a “spatial diffusion” update mechanism for background models aggravates the influence of misclassified camouflaged foreground pixels, and then decreases the power of ViBe in detecting still foreground objects. Camouflaged foreground objects and still foreground objects are two key reasons for false negatives in the detection results, and it is imperative and urgent to solve these two challenging issues in video surveillance.

In order to solve the aforementioned challenges, we propose an improved ViBe method using an adaptive distance threshold (hereafter IViBe for short). In light of the sensitivity of the human visual system (HVS) with regard to intensity change under certain background illumination, we set an adaptive distance threshold in the background matching process for each background sample in accordance with its intensity. Experimental evaluations validate that, because of using features of the HVS and performing background matching based on an adaptive distance threshold, IViBe has a better discriminating power concerning foreground objects with similar intensities to the background, and then effectively improves the capability of ViBe in coping with camouflaged foreground objects. Furthermore, IViBe also reduces the number of misclassified pixels which usually serve as erroneous background seeds propagating the false negatives. Experimental results show that, compared with ViBe, our IViBe allows a slower inclusion of still foreground objects into the background, and has a better performance in detecting static foreground objects.

The rest of this paper is organized as follows. In Sec. 2, we briefly explore the major background subtraction approaches. Section 3 describes our IViBe method, introduces the detailed derivation of our adaptive distance threshold, and analyzes the influence of this adaptive distance threshold together with the “spatial diffusion” update mechanism on the detection results. In Sec. 4, we qualitatively and quantitatively analyze the advantages of our IViBe compared with ViBe. Finally, a conclusion is drawn in Sec. 5.

Background subtraction10 (BS) is an effective way of foreground segmentation for a stationary camera. In the BS methods, via comparing input video frames to their current background models, the regions corresponding to significant differences should be marked as foreground. Also, the BS techniques adapt their background models to scenario changes through online update and have a moderate computational complexity, which makes them popular methods for moving object detection.

Many BS techniques have been proposed with different kinds of background models, and several recent surveys have been devoted to this topic.1113 Although the last decade has witnessed numerous publications on the BS methods, according to 13, there are still many challenges not completely resolved in real scenes, such as illumination changes, dynamic backgrounds, bootstrapping, camouflage, shadows, still foreground objects, and so on. In 2014, two special issues14,15 have just been published with new developments for dealing with these challenges.

Next, we briefly explore the major BS approaches according to the different kinds of background models they used.

Parametric Models

Gaussian mixture model (GMM) and its improved methods: GMM is a classical and probably the most widely used BS technique.16 GMM models the temporal distribution of each pixel using a mixture of Gaussians, and many studies have proven that GMM can handle gradual illumination changes and repetitive background motion well. In 17, Lee proposed an adaptive learning rate for each Gaussian model to improve the convergence rate without affecting the stability. In 18, Zivkovic and Van Der Heijden proposed a scheme to dynamically determine the appropriate number of Gaussian models for each pixel based on observed scene dynamics to reduce processing time. In 19, Zhang et al. used a spatio-temporal Gaussian mixture model incorporating spatial information to handle complex motions of the background.

Models using other statistical distributions: recently, a mixture of symmetric alpha-stable distributions20 and a mixture of asymmetric Gaussian distributions21 have been employed to enhance the robustness and flexibility of mixture modeling in real scenarios, respectively. They can handle the dynamic backgrounds well. In 22, Haines and Xiang proposed a Dirichlet process Gaussian mixture model which constantly adapts its parameters to the scene in a block-based method.

Nonparametric Models

Kernel density estimation (KDE) and its improved methods: a nonparametric technique23 was developed to estimate background probabilities at each pixel from many recent samples over time using KDE. In 24, Sheikh modeled the background using KDE over a joint domain-range representation of image pixels to sustain high levels of detection accuracy in the presence of dynamic backgrounds.

Codebook and its improved methods: the essential idea behind the codebook25 approach is to capture long-term background motion with limited memory by using a codebook for each pixel. In 4, a multilayer codebook-based background subtraction (MCBS) model was proposed. Combining the multilayer block-based strategy and the adaptive feature extraction from blocks of various sizes, MCBS can remove most of the dynamic backgrounds and significantly increase the processing efficiency.

Advanced Models

Self-organizing background subtraction (SOBS) and its improved methods: in the 2012 IEEE change detection workshop26 (CDW-2012), SOBS27 and its improved method SC-SOBS28 obtained excellent results. In 27, SOBS adopted a self-organizing neural network to build a background model, initialized its model from the first frame, and employed regional diffusion of background information in the update step. In 2012, Maddalena improved the SOBS by introducing spatial coherence into the background update procedure, which led to the SC-SOBS algorithm providing further robustness against false detections. In 29, three-dimensional self-organizing background subtraction (3D_SOBS) used spatio-temporal information to detect a stopped object. Recently, the 3DSOBS+1 algorithm further enhanced the 3D_SOBS approach to accurately handle scenes containing dynamic backgrounds, gradual illumination changes, and shadows cast by moving objects.

ViBe and its improved methods: in the CDW-2012, ViBe8 and its improved method ViBe+30 also achieved remarkable results. Barnich and Van Droogenbroeck proposed a sample-based algorithm that builds the background model by aggregating previously observed values for each pixel location. The key innovation of ViBe is introducing the random policy into the BS, which makes it the first nondeterministic BS method. In 30, Van Droogenbroeck and Barnich improved ViBe in many aspects, including an adaptive threshold. They computed the standard deviation of background samples of a pixel to define a matching threshold. The matching threshold adapts itself to statistical characteristics of background samples; however, all background samples of a pixel have the same thresholds, and one wrongly updated background sample will affect the thresholds of other background samples, which will lead to more misclassification. In Refs. 30 and 31, a new update mechanism separating “segmentation map” and “updating mask” was proposed. The “spatial diffusion” update mechanism can be inhibited in the “updating mask” to detect still foreground objects. In 32, Mould and Havlicek proposed an update mechanism in which foreground pixels can update their background models by replacing the most significant outlying samples. This update policy can improve the ability to deal with ghosts.

Human Visual System-Based Models

Visual saliency, another important concept about the HVS, has already been used in the BS methods. In 33, Liu et al. represented object saliency for moving object detection by an information saliency map calculated from spatio-temporal volumes. In 34, Mahadevan and Vasconcelos proposed a BS algorithm based on spatio-temporal saliency using a center-surround framework, which is inspired by biological mechanisms of motion-based perceptual grouping. These methods have shown the potential of the HVS in moving object detection.

In this paper, we propose an improved BS technique which uses the characteristic of the HVS.

We introduce an adaptive distance threshold into ViBe to simulate the capacity of the HVS in perceiving noticeable intensity changes, which can discriminate camouflaged foreground objects and reduce false negatives. Together with ViBe’s update policy, our method further improves the ability to detect foreground objects that are motionless for a while. Hence, IViBe improves the ability of ViBe in dealing with camouflaged and still foreground objects.

Our IViBe is a pixel-based BS method. When building the background model for each pixel, it does not rely on a temporal statistical distribution, but employs a universal sample-based method instead. Let xi be an arbitrary pixel in a video image, and B(xi) be its background model containing N background samples (values taken in the past in the same location or in the neighborhood): Display Formula

B(xi)={B1(xi),,Bk(xi),,BN(xi)}.(1)

The background model B(xi) is first initialized from one single frame according to the intensities of pixel xi and its neighboring pixels, and then updated online when pixel xi is classified as background or by a “spatial diffusion” update mechanism.

The pixel xi is classified as a background pixel only if its current intensity I(xi) is closer than a certain distance threshold Rk(xi) (1kN) to at least #min of its N background samples. Thus, the foreground segmentation mask is calculated as Display Formula

F(xi)={1,#{|I(xi)Bk(xi)|<Rk(xi)}<#min,0,else.(2)

Here, F(xi)=1 signifies that the pixel xi is a foreground pixel, # denotes the cardinality of a set, #min is a fixed parameter indicating the minimal matching number, and Rk(xi) is an adaptive distance threshold according to the perception characteristics of the HVS.

In Sec. 3.1, we introduce our adaptive distance threshold and its derivation. Section 3.2 shows how our adaptive distance threshold together with the “spatial diffusion” update mechanism affects the detection results.

Adaptive Distance Threshold

In order to better segment foreground objects similar to the background, we introduce an adaptive distance threshold Rk(xi) for background matching. Different from ViBe which uses a fixed distance threshold Rk(xi)=20 for each background sample, we propose an adaptive distance metric through simulating the characteristics of human visual perception (i.e., Weber’s law35).

Weber’s law describes the human response to a physical stimulus in a quantitative fashion. The just noticeable difference (JND) is the minimum amount by which stimulus intensity must be changed in order to produce a noticeable variation in the sensory experience. Ernst Weber, a 19th century experimental psychologist, observed that the size of the JND is linearly proportional to the initial stimulus intensity. This relationship, known as Weber’s law, can be expressed as Display Formula

ΔIJND/I=c,(3)
where ΔIJND represents the JND, I represents the initial stimulus intensity, and c is a constant called the Weber ratio.

In visual perception, Weber’s law actually describes the ability of the HVS for brightness discrimination, and the Weber ratio can be obtained by a classic experiment36 which consists of having a subject look at a flat, uniformly illuminated area (with intensity I) large enough to occupy the entire field of view, as Fig. 1 shows. An increment of illumination (i.e., ΔI) is added to the field and appears as a circle in the center. When ΔI achieves ΔIJND, the subject will give a positive response, indicating a perceivable change. In Weber’s law, ΔIJND is in direct proportion to I. Hence, the ΔIJND is small in dark backgrounds and big in bright backgrounds.

Graphic Jump LocationF1 :

Basic experimental setup used to characterize brightness discrimination.

In the BS methods, when comparing current intensity with the corresponding background model, the distance threshold can actually be considered as the critical intensity difference in distinguishing foreground objects from the background. Fortunately, Weber’s law describes the capacity of the HVS in perceiving noticeable intensity changes, and the JND that the HVS can perceive is in direct proportion to the background illumination. Inspired by Weber’s law, we propose our adaptive distance threshold in direct proportion to the background sample intensity; namely, the distance threshold should be low for a dark background sample and high for a bright background sample.

In our method, mapping to Weber’s law is as follows: the background sample intensity Bk(xi) can be regarded as the initial intensity I, the difference between the current value and each background sample is the intensity change ΔI, and the distance threshold Rk(xi) can be regarded as the JND (i.e., ΔIJND). Consequently, on the basis of Weber’s law, we set Display Formula

Rk(xi)/Bk(xi)=c.(4)

In Eq. (4), Bk(xi) is the known background sample intensity, and if we want to derive the distance threshold Rk(xi), we have to first obtain the Weber ratio c. However, we cannot directly use the Weber ratio obtained in the classic experiment, because the classic experiment uses a uniformly illuminated area as background, but what we need in our method is a Weber ratio with a complex image as the background. As described in 37, “for any point or small area in a complex image, the Weber ratio is generally much larger than that obtained in an experimental environment because of the lack of sharply defined boundaries and intensity variations in the background.” Moreover, it is also difficult to gain the Weber ratio via redoing the classic experiment using a complex image as the background, because such an experiment will need many subjects and the subjects’ evaluation criteria are inconsistent, which will reduce the creditability of the experiment.

Based on the consideration above, we employ a substitute of subjective evaluations in the classic experiment to derive the Weber ratio c for a complex image as the background. Specifically, the substitute is the difference of the peak signal-to-noise ratio (PSNR38) presented by the motion picture experts group (MPEG). The MPEG recommends that,38 for an original reference image (R) and two of its reconstructed images (D1 and D2), only when the difference of PSNR (i.e., ΔPSNR) satisfies Display Formula

|PSNR(D1,R)PSNR(D2,R)|0.5(dB),(5)
the HVS can perceive that D1 and D2 are different. In Eq. (5), PSNR(D,R) is used to estimate the level of errors in a distorted image D from its original reference image R. For grayscale images with intensities in the range of [0, 255], PSNR(D,R) is defined as Display Formula
PSNR(D,R)=20lg2551nDR1=20lg2551nm=1n|dmrm|(dB),(6)
where n is the number of pixels in the original image R, and dm and rm denote the intensities of the m’th pixel in D and R, respectively.

Since ΔPSNR can objectively reflect the ability of the HVS in discriminating intensity changes, we use ΔPSNR to substitute the subjects’ perception in the classic experiment with a complex image as the background. Here, we first construct a complex image. Suppose there is a complex image whose rows and columns are divided into 16 equal parts, respectively. Thus, the complex image is composed of 256 regions of the same size. For each region, the setup is the same as the classic experiment shown in Fig. 1. That is, each region is uniformly illuminated with intensity I, and an increment of illumination (i.e., ΔI) is added to the centered circle. Such a region is called a basic region. The complex image consists of 256 basic regions (with I=0,1,,255), which are randomly permutated, as shown in Fig. 2. In this way, we construct a complex image as the background to simulate the classic experiment in all intensity levels simultaneously, which makes our derivation general and objective.

Graphic Jump LocationF2 :

Simulated complex image as background.

All the circles in the basic regions of Fig. 2 simultaneously change their intensities with ΔI. When |ΔI| reaches ΔIJND for all the basic regions, the HVS can barely perceive the intensity changes of the complex image (let this image be D1). When |ΔI|=ΔIJND+ε (ε is a very small constant, and for a digital image we set ε=1) for all the basic regions, the HVS can obviously perceive the intensity changes of the complex image (let this image be D2). Suppose the complex image shown in Fig. 2 is the original reference image (i.e., R), then D1 and D2 can be regarded as two different distorted images which are reconstructed from the same R and are just perceivably distinguishable by the HVS. Accordingly, on the basis of Eq. (3), the 1-norm of difference between R and D1 is given in Eq. (7), and the 1-norm of difference between R and D2 is provided in Eq. (8), Display Formula

D1R1=I=0255wΔIJND=I=0255wcI=wI=0255cI,(7)
Display Formula
D2R1=I=0255w(ΔIJND+1)=I=0255w(cI+1)=wI=0255(cI+1),(8)
where w denotes the number of pixels in the circle of each basic region in Fig. 2. In accordance with the recommendation of the MPEG, the difference of PSNR between these two reconstructed images (D1 and D2) meets equality in Eq. (5), i.e., ΔPSNR=0.5(dB), that is, Display Formula
20lg255wnI=0255cI20lg255wnI=0255(cI+1)=0.5,(9)
where n denotes the number of pixels in the complex image.

Simplifying Eq. (9), we can derive c=0.13. As a result, we conclude that the relationship between the intensity of a background sample and its corresponding distance threshold is: Rk(xi)=0.13Bk(xi).

Nevertheless, according to the description of brightness adaptation of the HVS in 37, we can infer that, in the extremely dark and extremely bright regions of a complex image, the linear relationship in Weber’s law cannot precisely describe the relation between perceptible intensity changes of the HVS and the background illumination. Therefore, our solution is to cut off the distance threshold for background samples whose intensities are too high or too low. After many experiments, we empirically set [10%, 90%] of the entire intensity range satisfying the linear relationship. Namely, the cut off intensities are T1=255×0.1=26 and T2=255×0.9=230. Consequently, the adaptive distance threshold can be calculated as Display Formula

Rk(xi)=cmin{max[Bk(xi),T1],T2},(10)
which is shown in Fig. 3.

Graphic Jump LocationF3 :

The relationship between the intensity of a background sample and its corresponding distance threshold.

Background Model Update Mechanism and Impacts of Our Adaptive Distance Threshold Together with Update Mechanism on Detection Results

It is essential to update the background model B(xi) to adapt to changes in the background, such as lighting changes and variations of the background. The update of background models is not only for pixels classified as background, but also for their randomly selected eight-connected neighborhood. In detail, when a pixel xi is classified as background, its current intensity I(xi) is used to randomly replace one of its background samples Bk(xi) (k{1,2,.,N}) with a probability p=1/ϕ, where ϕ is a time subsampling factor similar to the learning rate in GMM (the smaller the ϕ we use, the faster the update speed we get). After updating the background model of pixel xi, we randomly select a pixel xj in the eight-connected spatial neighborhood of pixel xi, i.e., xjN8(xi). In light of the spatial consistency of neighboring background pixels, we also use the current intensity I(xi) of pixel xi to randomly replace one of pixel xj’s background samples Bk(xj) (k{1,2,.,N}). In this way, we allow a spatial diffusion of background samples in the process of background model update.

The advantage of this “spatial diffusion” update mechanism is the quick absorption of certain types of ghosts (a set of connected points, detected as in motion but not corresponding to any real moving object8). Some ghosts result from removing some parts of the background; therefore, those ghost areas often share similar intensities with their surrounding background. When background samples from surrounding areas try to diffuse inside the ghosts, these samples are likely to match with current intensities at the diffused locations. Thus, the diffused pixels in the ghosts are gradually classified as background. In this way, the ghosts can be progressively eroded until they entirely disappear.

However, the “spatial diffusion” update mechanism is disadvantageous for detecting still foreground objects. In environments where the foreground objects are static for several frames, either because the foreground objects share similar intensities with the background, or due to the noise inevitably emerging in the video sequence, some pixels of the foreground objects may be misclassified as background, and then serve as erroneous background seeds propagating foreground intensities in the background models of their neighboring pixels. Since foreground objects are still for several frames, the background models of the neighboring pixels of these misclassified pixels will suffer from more and more incorrect background samples coming from misclassified foreground intensities. In this way, there will be more misclassified foreground pixels, which will lead to the diffusion of misclassification.

Fortunately, our IViBe employs background matching based on an adaptive distance threshold which can reduce the misclassification inside the foreground objects, can slow down the speed of the misclassification diffusion, and can lower the eaten up speed of still foreground objects. First, IViBe makes full use of the adaptive distance threshold to enhance the discriminating power of similar foregrounds and backgrounds, and then reduces the number of misclassified foreground pixels, which can decrease the possibility of erroneous background seeds occurring. Second, even though misclassification emerges inside the foreground objects for some reason and leads foreground intensities diffusing into the background models of neighboring pixels, the adaptive distance threshold can also cut down the misclassification possibilities of those neighboring pixels inside the foreground objects. Via the aforementioned analysis, we conclude that IViBe has the ability to detect still foreground objects that are present for several frames.

Since we use the adaptive distance threshold as Eq. (10), our threshold for dark areas will be smaller than that of ViBe, hence fewer pixels will be classified as background and then be updated; whereas for bright areas, our threshold will be larger than the fixed threshold used by ViBe, and so more pixels will be classified as background and will then be updated. Accordingly, the updating probability is lower for dark areas and higher for bright areas.

In this section, we first list the test sequences and determine the optimal values of parameters in our IViBe method, and then compare our results with those of ViBe in terms of qualitative and quantitative evaluations.

Experimental Setup
Test sequences

In our experiments, we employ the widely used changedetection.net26,39 (CDnet) benchmark. We select two sequences to test the capability of these techniques in coping with the camouflaged foreground objects. One sequence is called “lakeSide” from the thermal category, and the other sequence is called “blizzard” from the bad weather category. In the lakeSide sequence, two people are undistinguishable from the lake behind them in thermal imagery after they get out from the lake and have the same temperatures as the lake. This sequence is really a challenging camouflage scenario, for it is even difficult for eyes to discriminate these people from the background. In the blizzard sequence, a road is covered by heavy snow during bad weather; meanwhile, some cars passing through are white, and some cars with other colors are partially covered by white snow, which makes correct classification difficult.

Besides, to validate the power of IViBe in coping with still foreground objects, we further choose two other typical sequences from the CDnet. One sequence is called “library” from the thermal category, and the other sequence is called “sofa” from the intermittent object motion category. In the library sequence, a man walks in the scene and selects a book, and then sits in front of a desk reading the book for a long time. In the sofa sequence, several men successively sit on a sofa to rest for dozens of frames, and place their belongings (foreground) aside; for example, a box is abandoned on the ground and a bag is left on the sofa.

Moreover, to test the performance of our method in general environments, we also select the baseline category which contains four videos (i.e., highway, office, pedestrians, and PETS2006) with a mixture of mild challenges (including dynamic backgrounds, camera jitter, shadows, and intermittent object motion). For example, the highway sequence endures subtle background motion, the office sequence suffers from small camera jitter, the pedestrians sequence has isolated shadows, and the PETS2006 sequence has abandoned objects and pedestrians that stop for a short while and then move away. These videos are fairly easy but are not trivial to process.26

Determination of parameter setting

There are six parameters in IViBe: number of background samples stored in each pixel’s background model (i.e., N), ratio of Rk(xi) to Bk(xi) (i.e., c), cutoff thresholds (i.e., T1 and T2), required number of close background samples when classifying a pixel as background (i.e., #min), and time subsampling factor (i.e., ϕ).

In Sec. 3.1, we have determined the parameters of our adaptive distance threshold, namely c=0.13, T1=26, and T2=230.

In order to evaluate #min and N with a variety of values, we introduce the metric called percentage of correct classification8 (PCC) that is widely used in computer vision to assess the performance of a binary classifier. Let TP be the number of true positives, TN be the number of true negatives, FP be the number of false positives, and FN be the number of false negatives. These raw data (i.e., TP, TN, FP and FN) are summed up over all the frames with ground-truth references in a video. The definition of PCC is given as follows: Display Formula

PCC=100(TP+TN)TP+FP+TN+FN.(11)

Figure 4 illustrates the evolution of the PCC of IViBe on the pedestrians sequence (with 800 ground-truth references) in the baseline category for #min ranging from 1 to 20. The other parameters are fixed to N=20, c=0.13, T1=26, T2=230, and ϕ=16. As shown in Fig. 4, when the #min increases, the PCC goes down. The best PCCs are obtained for #min=1 (PCC=99.8310), #min=2 (PCC=99.8324) and #min=3 (PCC=99.7923). In our experiments, we find that for stable backgrounds like those in the baseline category, #min=1 can also lead to excellent results. But in more challenging scenarios, #min=2 and #min=3 are good choices. Since a rise in #min is likely to increase the computational cost of IViBe, we set #min=2.

Graphic Jump LocationF4 :

PCCs for #min ranging from 1 to 20.

Once we set #min=2, we study the influence of the parameter N on the performance of IViBe. Figure 5 shows the percentages obtained on the pedestrians sequence for N ranging from 2 to 30. The other parameters are fixed to #min=2, c=0.13, T1=26, T2=230, and ϕ=16. We observe that higher values of N provide a better performance. However, the PCCs tend to saturate for N20. Considering that a large N value will induce a large memory cost, we select N=20.

Graphic Jump LocationF5 :

PCCs for N ranging from 2 to 30.

The time subsampling factor ϕ is just like the learning rate in the GMM. A large time subsampling factor indicates a small update probability, then the background samples are unable to timely adapt to changes in the real backgrounds, such as gradual illumination changes. That is, when using a large ϕ, there may be more false positives due to the outdated background model. On the contrary, a small ϕ means that the background samples are very likely to be updated according to the current frame, and a still foreground object may be much easier to be absorbed into the background to produce more false negatives. Hence, there is a trade-off to adjust ϕ in order to balance the false positives and the false negatives. Besides, ϕ also affects the speed of our method, because a small ϕ will lead to a much higher computational cost for updating. As in ViBe, we also set ϕ=16.

Therefore, the parameters of IViBe are set as follows: the number of background samples stored in each pixel’s background model is fixed to N=20; the ratio of Rk(xi) to Bk(xi) is set to c=0.13; the cutoff thresholds are set to T1=26 and T2=230; the required number of close background samples when classifying a pixel as background is fixed to #min=2; the time subsampling factor is fixed to ϕ=16.

As for ViBe, the recommended parameters as suggested in 8 have been used: N=20, Rk(xi)=20, #min=2, and ϕ=16.

Other settings

For fair comparison, no postprocessing techniques (such as noise filtering, morphological operations, connected components analysis, etc.) are applied in our test for the purpose of evaluating the unaided strength of each approach.

Visual Comparisons

For qualitative evaluation, we visually compare the detection results of our IViBe with those of ViBe in Figs. 678910 on the test sequences. Although multiple test frames were used for each test sequence, we only show one typical frame for each sequence here due to space limitation.

Graphic Jump LocationF6 :

Detection results of the lakeSide sequence: (a) frame 2255 of the lakeSide sequence, (b) ground-truth reference, (c) result of ViBe, (d) result of IViBe.

Graphic Jump LocationF7 :

Detection results of the blizzard sequence: (a) frame 1266 of the blizzard sequence, (b) ground-truth reference, (c) result of ViBe, (d) result of IViBe, (e)–(h) partial enlarged views of (a)–(d).

Graphic Jump LocationF8 :

Detection results of the library sequence: (a) frame 2768 of the library sequence, (b) ground-truth reference, (c) result of ViBe, (d) result of IViBe.

Graphic Jump LocationF9 :

Detection results of the sofa sequence: (a) frame 900 of the sofa sequence, (b) ground-truth reference, (c) result of ViBe, (d) result of IViBe.

Graphic Jump LocationF10 :

Detection results of the baseline category: (a) input frames, (b) ground-truth references, (c) results of ViBe, (d) results of IViBe.

Figure 6 shows the detection results of the lakeSide sequence. In the input frame shown in Fig. 6(a), after swimming in the lake, the body temperatures of the people are similar to that of the lake; therefore, intensities inside the human bodies (except the heads) are almost the same as the intensities of the lake. In the detection result of ViBe shown in Fig. 6(c), we can find that the child’s body is incomplete with many false negatives. This is mainly because ViBe uses a fixed distance threshold Rk(xi)=20 which is large for dark environments, and unfortunately classifies dark foreground objects as background. However, as shown in Fig. 6(d), our IViBe is able to correctly detect most of the foreground regions due to its utilization of an adaptive distance threshold based upon the perception characteristics of the HVS.

In Fig. 7, the detection results of the blizzard sequence are depicted. To show more clearly, we enlarge two areas that only contain foreground cars to illustrate the improvement of our method. The blizzard sequence is also a very challenging sequence, as shown in Figs. 7(a) and 7(e) [partial enlarged views of the Fig. 7(a)]. Because of snow fall, most of the cars appear white, which will lead to confusion between the passing cars and the road covered by thick snow. As can be seen in Fig. 7(c) and particularly in Fig. 7(g), in the detection result of ViBe, there are holes inside the detected cars, and obviously false detections appear in the areas covered by snow. In contrast to ViBe, our IViBe can discriminate subtle variations using an adaptive distance threshold, and can gain more complete detection results. As shown in Fig. 7(h), our IViBe achieves an evident improvement compared to ViBe.

Figure 8 illustrates the detection results of the library sequence. This is an infrared sequence and contains a lot of noise. In Fig. 8(a), a man is static for a long time while he sits on the chair reading a book. Because of inevitable noise, in the detection result of ViBe, misclassification emerges in the head, shoulder, and legs of the foreground, later propagates to their neighboring pixels, and finally results in large holes inside the foreground, as shown in Fig. 8(c). However, due to the adaptive distance threshold, our method yields less misclassification in the regions of head, shoulder, and legs of the foreground, and also suppresses the propagation of misclassification. These results prove that our IViBe is more powerful for detecting still foreground objects lasting for some frames in comparison with ViBe.

Figure 9 shows the detection results of the sofa sequence. In Fig. 9(a), we can find an abandoned box (foreground) which is static for a long time on the left corner. Meanwhile, a man sits on the sofa and remains still for quite an extended period. Due to the presence of noise and the adoption of a “spatial diffusion” update mechanism, in the detection result of ViBe, as shown in Fig. 9(c), the box is almost eaten-up and a large number of false negatives appear inside the man. In Fig. 9(d), a notable improvement is shown in our result: the man is more complete and the top surface of the box is well detected. This improvement is mainly the result of the adaptive distance threshold we used.

Figure 10 shows the detection results of the baseline category. For the highway sequence, our method produces more scattered false positives than ViBe in the dark areas of waving trees and their shadows, but detects more complete cars in the top right corner. For the office sequence, a man stands still for some time while reading a book, and Fig. 10(d) shows that IViBe detects more true positives in the legs of the man compared to ViBe. For the pedestrians sequence, both methods yield similar results with evident shadow areas. For the PETS2006 sequence, a man and his bag remain still for a while, and IViBe obviously detects more complete results.

Quantitative Comparisons

To objectively assess the detection results, we employ four metrics26,39 recommended by the CDnet, i.e., recall, precision, F1, and percentage of wrong classification (PWC) to judge the performance of the BS methods on pixel level. Let TP be the number of true positives, TN be the number of true negatives, FP be the number of false positives, and FN be the number of false negatives. These raw data (i.e., TP, TN, FP and FN) are summed up over all the frames with ground-truth references in a video. For a video v in a category a, these metrics are defined as Display Formula

recallv,a=TPv,aTPv,a+FNv.a,(12)
Display Formula
precisionv,a=TPv,aTPv,a+FPv,a,(13)
Display Formula
F1v,a=2recallv,a·precisionv,arecallv,a+precisionv,a,(14)
Display Formula
PWCv,a=100(FNv,a+FPv,a)TPv,a+FNv,a+FPv,a+TNv,a.(15)

Then the average metrics of category a can be calculated as Display Formula

recalla=1Nav=1Narecallv,a,(16)
Display Formula
precisiona=1Nav=1Naprecisionv,a,(17)
Display Formula
F1a=1Nav=1NaF1v,a,(18)
Display Formula
PWCa=1Nav=1NaPWCv,a,(19)
where Na is the number of videos in the category a. These metrics are called category-average metrics.

Generally, the recall (known as detection rate) is used in conjunction with the precision (known as positive prediction), and a method is considered good if it reaches high recall values without sacrificing precision.27 Since recall and precision often contradict each other, the overall indicators (F1 and PWC) which integrate false positives and false negatives in one single measure are employed to further compare the results. The first three metrics mentioned above all lie in the range of [0,1], and the higher the metrics are, the better the detection results are. The PWC lies in [0,100]; here, lower is better.

Tables 10020034 show these four metrics for the lakeSide, blizzard, library and sofa sequences using ViBe and our method, respectively. These metrics are calculated utilizing all the ground-truth references available. That is, frames 1000 to 6500 in the lakeSide sequence; frames 900 to 7000 in the blizzard sequence; frames 600 to 4900 in the library sequence; frames 500 to 2750 in the sofa sequence.

Table Grahic Jump Location
Table 1Comparison of metrics for the lakeSide sequence.
Table Grahic Jump Location
Table 2Comparison of metrics for the blizzard sequence.
Table Grahic Jump Location
Table 3Comparison of metrics for the library sequence.
Table Grahic Jump Location
Table 4Comparison of metrics for the sofa sequence.

As illustrated in Table 1, for the lakeSide sequence, the precision value of our method decreases slightly compared to that of ViBe; however, the recall value of our IViBe increases remarkably compared to that of ViBe. With regard to the overall indicators (F1 and PWC), our method exhibits an impressive improvement over ViBe. As seen in Table 2, for the blizzard sequence, our precision value decreases by 0.01, but our recall value increases by 0.06. For F1 and PWC, our method achieves a moderate improvement. As shown in Table 3, for the library sequence, our proposed IViBe produces results with all the metrics better than those of ViBe. Table 4 shows that, for the sofa sequence, our precision value decreases by 0.13, while our recall value increases by 0.25. For F1 and PWC, our method obtains a remarkable improvement. The experimental results demonstrate that, in the scenarios which contain camouflaged foreground objects, our IViBe can significantly reduce false negatives in the detection results; in the environments where the foreground objects are static for some frames, our IViBe slows the eaten-up speed of those still foreground objects in the detection results.

To calculate the category-average metrics of the baseline category, we also utilize all the ground-truth references available. That is, frames 470 to 1700 in the highway sequence; frames 570 to 2050 in the office sequence; frames 300 to 1099 in the pedestrians sequence; frames 300 to 1200 in the PETS2006 sequence. Table 5 shows the category-average metrics for the baseline category using ViBe and our method. As can be seen in Table 5, our method produces results with a larger recall and a smaller precision; however, the overall indicators (F1 and PWC) of both methods are quite similar.

Table Grahic Jump Location
Table 5Comparison of category-average metrics for the baseline category.

In general, through quantitative analysis, our IViBe method outperforms ViBe when dealing with camouflaged and still foreground objects, and has a similar performance to ViBe when dealing with normal videos with mild challenges.

According to the perception characteristics of the HVS concerning the minimum intensity changes under certain background illuminations, we propose an improved ViBe method using an adaptive distance threshold for each background sample in accordance with its intensity. Experimental results demonstrate that our IViBe can effectively improve the ability to deal with camouflaged foreground objects. Since the camouflaged foreground objects are ubiquitous in every real world video sequence, our IViBe has powerful practical value in smart video surveillance systems. Moreover, because of the capacity in dealing with the camouflaged foreground objects, our IViBe not only cuts down the misclassification of foreground pixels as background, but also further suppresses the propagation of misclassification, especially for those pixels inside the still foreground objects. Experimental results also prove that our method outperforms ViBe in scenarios in which foreground objects remain static for several frames.

The authors would like to thank the anonymous reviewers for their insightful comments that helped to improve the quality of this paper. This work is supported by the National Natural Science Foundation of China under Grant No. 61374097, the Fundamental Research Funds for the Central Universities (N130423006), the Natural Science Foundation of Hebei Province under Grant No. F2012501001, and the Foundation of Northeastern University at Qinhuangdao (XNK201403).

Maddalena  L., Petrosino  A., “The 3DSOBS+ algorithm for moving object detection,” Comput. Vis. Image Und.. 122, , 65 –73 (2014). 1077-3142 CrossRef
Tong  L. et al., “Encoder combined video moving object detection,” Neurocomputing. 139, , 150 –162 (2014). 0925-2312 CrossRef
Oreifej  O., Li  X., Shah  M., “Simultaneous video stabilization and moving object detection in turbulence,” IEEE Trans. Pattern Anal. Mach. Intell.. 35, (2 ), 450 –462 (2013). 0162-8828 CrossRef
Guo  J. et al., “Fast background subtraction based on a multilayer codebook model for moving object detection,” IEEE Trans. Circuits Syst. Video Technol.. 23, (10 ), 1809 –1821 (2013). 1051-8215 CrossRef
Cuevas  C., García  N., “Improved background modeling for real-time spatio-temporal non-parametric moving object detection strategies,” Image Vis. Comput.. 31, (9 ), 616 –630 (2013). 0262-8856 CrossRef
Cuevas  C., Mohedano  R., García  N., “Kernel bandwidth estimation for moving object detection in non-stabilized cameras,” Opt. Eng.. 51, (4 ), 040501  (2012). 0091-3286 CrossRef
Chiranjeevi  P., Sengupta  S., “Moving object detection in the presence of dynamic backgrounds using intensity and textural features,” J. Electron. Imaging. 20, (4 ), 043009  (2011). 1017-9909 CrossRef
Barnich  O., Van Droogenbroeck  M., “ViBe: a universal background subtraction algorithm for video sequences,” IEEE Trans. Image Process.. 20, (6 ), 1709 –1724 (2011). 1057-7149 CrossRef
Brutzer  S., Höferlin  B., Heidemann  G., “Evaluation of background subtraction techniques for video surveillance,” in  Proc. IEEE Computer Society Conf. on Computer Vision Pattern Recognition , pp. 1937 –1944,  IEEE ,  Piscataway, NJ  (2011).
Benezeth  Y. et al., “Comparative study of background subtraction algorithms,” J. Electron. Imaging. 19, (3 ), 033003  (2010). 1017-9909 CrossRef
Sobral  A., Vacavant  A., “A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos,” Comput. Vis. Image Und.. 122, , 4 –21 (2014). 1077-3142 CrossRef
Bouwmans  T., Zahzah  E. H., “Robust PCA via principal component pursuit: a review for a comparative evaluation in video surveillance,” Comput. Vis. Image Und.. 122, , 22 –34 (2014). 1077-3142 CrossRef
Bouwmans  T., “Traditional and recent approaches in background modeling for foreground detection: an overview,” Comput. Sci. Rev.. 11–12, , 31 –66 (2014). 15740137 CrossRef
Vacavant  A. et al., “Special section on background models comparison,” Comput. Vis. Image Und.. 122, , 1 –3 (2014). 1077-3142 CrossRef
Bouwmans  T. et al., “Special issue on background modeling for foreground detection in real-world dynamic scenes,” Mach. Vis. Appl.. 25, (5 ), 1101 –1103 (2014). 0932-8092 CrossRef
Stauffer  C., Grimson  W. E. L., “Learning patterns of activity using real-time tracking,” IEEE Trans. Pattern Anal. Mach. Intell.. 22, (8 ), 747 –757 (2000). 0162-8828 CrossRef
Lee  D.-S., “Effective Gaussian mixture learning for video background subtraction,” IEEE Trans. Pattern Anal. Mach. Intell.. 27, (5 ), 827 –832 (2005). 0162-8828 CrossRef
Zivkovic  Z., Van Der Heijden  F., “Efficient adaptive density estimation per image pixel for the task of background subtraction,” Pattern Recognit. Lett.. 27, (7 ), 773 –780 (2006). 0167-8655 CrossRef
Zhang  W. et al., “Spatiotemporal Gaussian mixture model to detect moving objects in dynamic scenes,” J. Electron. Imaging. 16, (2 ), 023013  (2007). 1017-9909 CrossRef
Bhaskar  H., Mihaylova  L., Achim  A., “Video foreground detection based on symmetric alpha-stable mixture models,” IEEE Trans. Circuits Syst. Video Technol.. 20, (8 ), 1133 –1138 (2010). 1051-8215 CrossRef
Elguebaly  T., Bouguila  N., “Background subtraction using finite mixtures of asymmetric Gaussian distributions and shadow detection,” Mach. Vis. Appl.. 25, (5 ), 1145 –1162 (2014). 0932-8092 CrossRef
Haines  T. S. F., Xiang  T., “Background subtraction with Dirichlet process mixture models,” IEEE Trans. Pattern Anal. Mach. Intell.. 36, (4 ), 670 –683 (2014). 0162-8828 CrossRef
Elgammal  A. et al., “Background and foreground modeling using nonparametric kernel density estimation for visual surveillance,” Proc. IEEE. 90, (7 ), 1151 –1163 (2002). 0018-9219 CrossRef
Sheikh  Y., Shah  M., “Bayesian modeling of dynamic scenes for object detection,” IEEE Trans. Pattern Anal. Mach. Intell.. 27, (11 ), 1778 –1792 (2005). 0162-8828 CrossRef
Kim  K. et al., “Real-time foreground-background segmentation using codebook model,” Real-Time Imaging. 11, (3 ), 172 –185 (2005). 1077-2014 CrossRef
Goyette  N. et al., “Changedetection.net: a new change detection benchmark dataset,” in  Proc. IEEE Computer Society Conf. on Computer Vision Pattern Recognition Workshops , pp. 1 –8,  IEEE ,  Piscataway, NJ  (2012).
Maddalena  L., Petrosino  A., “A self-organizing approach to background subtraction for visual surveillance applications,” IEEE Trans. Image Process.. 17, (7 ), 1168 –1177 (2008). 1057-7149 CrossRef
Maddalena  L., Petrosino  A., “The SOBS algorithm: what are the limits?,” in  Proc. IEEE Computer Society. Conf. on Computer. Vision Pattern Recognition Workshops , pp. 21 –26,  IEEE ,  Piscataway, NJ  (2012).
Maddalena  L., Petrosino  A., “Stopped object detection by learning foreground model in videos,” IEEE Trans. Neural Netw. Learn. Sys.. 24, (5 ), 723 –735 (2013). 2162-237X CrossRef
Van Droogenbroeck  M., Barnich  O., “Background subtraction: experiments and improvements for ViBe,” in  Proc. IEEE Computer Society Conf. on Computer Vision Pattern Recognition Workshops , pp. 32 –37,  IEEE ,  Piscataway, NJ  (2012).
Van Droogenbroeck  M., Barnich  O., “ViBe: a disruptive method for background subtraction,” Background Modeling and Foreground Detection for Video Surveillance. Chapter 7 in Bouwmans  T., Porikli  F., Hoferlin  B., Vacavant  A., Eds., pp. 7-1 –7-23,  Chapman and Hall/CRC ,  London  (2014).
Mould  N., Havlicek  J. P., “A conservative scene model update policy,” in  Proc. IEEE Southwest Symp. on Image Anal. Interpret , pp. 145 –148,  IEEE ,  Piscataway, NJ  (2012).
Liu  C., Yuen  P. C., Qiu  G., “Object motion detection using information theoretic spatio-temporal saliency,” Pattern Recognit.. 42, (11 ), 2897 –2906 (2009). 0031-3203 CrossRef
Mahadevan  V., Vasconcelos  N., “Spatiotemporal saliency in dynamic scenes,” IEEE Trans. Pattern Anal. Mach. Intell.. 32, (1 ), 171 –177 (2010). 0162-8828 CrossRef
Jain  A. K., Fundamentals of Digital Image Processing. ,  Prentice Hall ,  Upper Saddle River, NJ  (1989).
Gonzalez  R. C., Woods  R. E., Eds., Digital Image Processing. , 3rd ed.,  Prentice Hall ,  Upper Saddle River, NJ  (2008).
Gonzalez  R. C., Wintz  P., Digital Image Processing. ,  Addison-Wesley ,  Reading, MA  (1977).
Salomon  D., Ed., Data Compression: The Complete Reference. , 4th ed.,  Springer ,  Berlin, Germany  (2007).
Wang  Y. et al., “CDnet 2014: an expanded change detection benchmark dataset,” in  Proc. IEEE Computer Society Conf. on Computer Vision Pattern Recognition Workshops , pp. 387 –394,  IEEE ,  Piscataway, NJ  (2014).

Guang Han is a lecturer at Northeastern University at Qinhuangdao, China. He received his B.Eng. and M.Eng. degrees from the School of Electronic and Information Engineering, Beihang University, Beijing, China, in 2005 and 2008, respectively. Now he is a PhD candidate at the College of Information Science and Engineering, Northeastern University, Shenyang, China. His current research interests include object detection and object tracking in video sequences.

Jinkuan Wang is a professor at Northeastern University at Qinhuangdao, China. He received his B.Eng. and M.Eng. degrees from Northeastern University, China, in 1982 and 1985, respectively, and his PhD degree from the University of Electro-Communications, Japan, in 1993. His current research interests include wireless sensor networks, multiple antenna array communication systems, and adaptive signal processing.

Xi Cai is an associate professor at Northeastern University at Qinhuangdao, China. She received her B.Eng. and Ph.D. degrees from the School of Electronic and Information Engineering, Beihang University, Beijing, China, in 2005 and 2011, respectively. Her research interests include image fusion, image registration, object detection, and object tracking.

© The Authors. Published by SPIE under a Creative Commons Attribution 3.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.

Citation

Guang Han ; Jinkuan Wang and Xi Cai
"Improved visual background extractor using an adaptive distance threshold", J. Electron. Imaging. 23(6), 063005 (Nov 06, 2014). ; http://dx.doi.org/10.1117/1.JEI.23.6.063005


Figures

Graphic Jump LocationF1 :

Basic experimental setup used to characterize brightness discrimination.

Graphic Jump LocationF2 :

Simulated complex image as background.

Graphic Jump LocationF3 :

The relationship between the intensity of a background sample and its corresponding distance threshold.

Graphic Jump LocationF4 :

PCCs for #min ranging from 1 to 20.

Graphic Jump LocationF5 :

PCCs for N ranging from 2 to 30.

Graphic Jump LocationF6 :

Detection results of the lakeSide sequence: (a) frame 2255 of the lakeSide sequence, (b) ground-truth reference, (c) result of ViBe, (d) result of IViBe.

Graphic Jump LocationF7 :

Detection results of the blizzard sequence: (a) frame 1266 of the blizzard sequence, (b) ground-truth reference, (c) result of ViBe, (d) result of IViBe, (e)–(h) partial enlarged views of (a)–(d).

Graphic Jump LocationF8 :

Detection results of the library sequence: (a) frame 2768 of the library sequence, (b) ground-truth reference, (c) result of ViBe, (d) result of IViBe.

Graphic Jump LocationF9 :

Detection results of the sofa sequence: (a) frame 900 of the sofa sequence, (b) ground-truth reference, (c) result of ViBe, (d) result of IViBe.

Graphic Jump LocationF10 :

Detection results of the baseline category: (a) input frames, (b) ground-truth references, (c) results of ViBe, (d) results of IViBe.

Tables

Table Grahic Jump Location
Table 1Comparison of metrics for the lakeSide sequence.
Table Grahic Jump Location
Table 2Comparison of metrics for the blizzard sequence.
Table Grahic Jump Location
Table 3Comparison of metrics for the library sequence.
Table Grahic Jump Location
Table 4Comparison of metrics for the sofa sequence.
Table Grahic Jump Location
Table 5Comparison of category-average metrics for the baseline category.

References

Maddalena  L., Petrosino  A., “The 3DSOBS+ algorithm for moving object detection,” Comput. Vis. Image Und.. 122, , 65 –73 (2014). 1077-3142 CrossRef
Tong  L. et al., “Encoder combined video moving object detection,” Neurocomputing. 139, , 150 –162 (2014). 0925-2312 CrossRef
Oreifej  O., Li  X., Shah  M., “Simultaneous video stabilization and moving object detection in turbulence,” IEEE Trans. Pattern Anal. Mach. Intell.. 35, (2 ), 450 –462 (2013). 0162-8828 CrossRef
Guo  J. et al., “Fast background subtraction based on a multilayer codebook model for moving object detection,” IEEE Trans. Circuits Syst. Video Technol.. 23, (10 ), 1809 –1821 (2013). 1051-8215 CrossRef
Cuevas  C., García  N., “Improved background modeling for real-time spatio-temporal non-parametric moving object detection strategies,” Image Vis. Comput.. 31, (9 ), 616 –630 (2013). 0262-8856 CrossRef
Cuevas  C., Mohedano  R., García  N., “Kernel bandwidth estimation for moving object detection in non-stabilized cameras,” Opt. Eng.. 51, (4 ), 040501  (2012). 0091-3286 CrossRef
Chiranjeevi  P., Sengupta  S., “Moving object detection in the presence of dynamic backgrounds using intensity and textural features,” J. Electron. Imaging. 20, (4 ), 043009  (2011). 1017-9909 CrossRef
Barnich  O., Van Droogenbroeck  M., “ViBe: a universal background subtraction algorithm for video sequences,” IEEE Trans. Image Process.. 20, (6 ), 1709 –1724 (2011). 1057-7149 CrossRef
Brutzer  S., Höferlin  B., Heidemann  G., “Evaluation of background subtraction techniques for video surveillance,” in  Proc. IEEE Computer Society Conf. on Computer Vision Pattern Recognition , pp. 1937 –1944,  IEEE ,  Piscataway, NJ  (2011).
Benezeth  Y. et al., “Comparative study of background subtraction algorithms,” J. Electron. Imaging. 19, (3 ), 033003  (2010). 1017-9909 CrossRef
Sobral  A., Vacavant  A., “A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos,” Comput. Vis. Image Und.. 122, , 4 –21 (2014). 1077-3142 CrossRef
Bouwmans  T., Zahzah  E. H., “Robust PCA via principal component pursuit: a review for a comparative evaluation in video surveillance,” Comput. Vis. Image Und.. 122, , 22 –34 (2014). 1077-3142 CrossRef
Bouwmans  T., “Traditional and recent approaches in background modeling for foreground detection: an overview,” Comput. Sci. Rev.. 11–12, , 31 –66 (2014). 15740137 CrossRef
Vacavant  A. et al., “Special section on background models comparison,” Comput. Vis. Image Und.. 122, , 1 –3 (2014). 1077-3142 CrossRef
Bouwmans  T. et al., “Special issue on background modeling for foreground detection in real-world dynamic scenes,” Mach. Vis. Appl.. 25, (5 ), 1101 –1103 (2014). 0932-8092 CrossRef
Stauffer  C., Grimson  W. E. L., “Learning patterns of activity using real-time tracking,” IEEE Trans. Pattern Anal. Mach. Intell.. 22, (8 ), 747 –757 (2000). 0162-8828 CrossRef
Lee  D.-S., “Effective Gaussian mixture learning for video background subtraction,” IEEE Trans. Pattern Anal. Mach. Intell.. 27, (5 ), 827 –832 (2005). 0162-8828 CrossRef
Zivkovic  Z., Van Der Heijden  F., “Efficient adaptive density estimation per image pixel for the task of background subtraction,” Pattern Recognit. Lett.. 27, (7 ), 773 –780 (2006). 0167-8655 CrossRef
Zhang  W. et al., “Spatiotemporal Gaussian mixture model to detect moving objects in dynamic scenes,” J. Electron. Imaging. 16, (2 ), 023013  (2007). 1017-9909 CrossRef
Bhaskar  H., Mihaylova  L., Achim  A., “Video foreground detection based on symmetric alpha-stable mixture models,” IEEE Trans. Circuits Syst. Video Technol.. 20, (8 ), 1133 –1138 (2010). 1051-8215 CrossRef
Elguebaly  T., Bouguila  N., “Background subtraction using finite mixtures of asymmetric Gaussian distributions and shadow detection,” Mach. Vis. Appl.. 25, (5 ), 1145 –1162 (2014). 0932-8092 CrossRef
Haines  T. S. F., Xiang  T., “Background subtraction with Dirichlet process mixture models,” IEEE Trans. Pattern Anal. Mach. Intell.. 36, (4 ), 670 –683 (2014). 0162-8828 CrossRef
Elgammal  A. et al., “Background and foreground modeling using nonparametric kernel density estimation for visual surveillance,” Proc. IEEE. 90, (7 ), 1151 –1163 (2002). 0018-9219 CrossRef
Sheikh  Y., Shah  M., “Bayesian modeling of dynamic scenes for object detection,” IEEE Trans. Pattern Anal. Mach. Intell.. 27, (11 ), 1778 –1792 (2005). 0162-8828 CrossRef
Kim  K. et al., “Real-time foreground-background segmentation using codebook model,” Real-Time Imaging. 11, (3 ), 172 –185 (2005). 1077-2014 CrossRef
Goyette  N. et al., “Changedetection.net: a new change detection benchmark dataset,” in  Proc. IEEE Computer Society Conf. on Computer Vision Pattern Recognition Workshops , pp. 1 –8,  IEEE ,  Piscataway, NJ  (2012).
Maddalena  L., Petrosino  A., “A self-organizing approach to background subtraction for visual surveillance applications,” IEEE Trans. Image Process.. 17, (7 ), 1168 –1177 (2008). 1057-7149 CrossRef
Maddalena  L., Petrosino  A., “The SOBS algorithm: what are the limits?,” in  Proc. IEEE Computer Society. Conf. on Computer. Vision Pattern Recognition Workshops , pp. 21 –26,  IEEE ,  Piscataway, NJ  (2012).
Maddalena  L., Petrosino  A., “Stopped object detection by learning foreground model in videos,” IEEE Trans. Neural Netw. Learn. Sys.. 24, (5 ), 723 –735 (2013). 2162-237X CrossRef
Van Droogenbroeck  M., Barnich  O., “Background subtraction: experiments and improvements for ViBe,” in  Proc. IEEE Computer Society Conf. on Computer Vision Pattern Recognition Workshops , pp. 32 –37,  IEEE ,  Piscataway, NJ  (2012).
Van Droogenbroeck  M., Barnich  O., “ViBe: a disruptive method for background subtraction,” Background Modeling and Foreground Detection for Video Surveillance. Chapter 7 in Bouwmans  T., Porikli  F., Hoferlin  B., Vacavant  A., Eds., pp. 7-1 –7-23,  Chapman and Hall/CRC ,  London  (2014).
Mould  N., Havlicek  J. P., “A conservative scene model update policy,” in  Proc. IEEE Southwest Symp. on Image Anal. Interpret , pp. 145 –148,  IEEE ,  Piscataway, NJ  (2012).
Liu  C., Yuen  P. C., Qiu  G., “Object motion detection using information theoretic spatio-temporal saliency,” Pattern Recognit.. 42, (11 ), 2897 –2906 (2009). 0031-3203 CrossRef
Mahadevan  V., Vasconcelos  N., “Spatiotemporal saliency in dynamic scenes,” IEEE Trans. Pattern Anal. Mach. Intell.. 32, (1 ), 171 –177 (2010). 0162-8828 CrossRef
Jain  A. K., Fundamentals of Digital Image Processing. ,  Prentice Hall ,  Upper Saddle River, NJ  (1989).
Gonzalez  R. C., Woods  R. E., Eds., Digital Image Processing. , 3rd ed.,  Prentice Hall ,  Upper Saddle River, NJ  (2008).
Gonzalez  R. C., Wintz  P., Digital Image Processing. ,  Addison-Wesley ,  Reading, MA  (1977).
Salomon  D., Ed., Data Compression: The Complete Reference. , 4th ed.,  Springer ,  Berlin, Germany  (2007).
Wang  Y. et al., “CDnet 2014: an expanded change detection benchmark dataset,” in  Proc. IEEE Computer Society Conf. on Computer Vision Pattern Recognition Workshops , pp. 387 –394,  IEEE ,  Piscataway, NJ  (2014).

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Related Book Chapters

Topic Collections

Advertisement
  • Don't have an account?
  • Subscribe to the SPIE Digital Library
  • Create a FREE account to sign up for Digital Library content alerts and gain access to institutional subscriptions remotely.
Access This Article
Sign in or Create a personal account to Buy this article ($20 for members, $25 for non-members).
Access This Proceeding
Sign in or Create a personal account to Buy this article ($15 for members, $18 for non-members).
Access This Chapter

Access to SPIE eBooks is limited to subscribing institutions and is not available as part of a personal subscription. Print or electronic versions of individual SPIE books may be purchased via SPIE.org.