Discriminative correlation filters (DCFs) have shown excellent performance in visual tracking. DCF substitutes the sliding windows sampling strategy in traditional tracking methods with circular shift of the context area. Via projecting the filter learning into the frequency domain, DCF achieves satisfying performance and speed. Appropriate context area size has an influence on the performance of correlation filters. Small context area limits the CF’s ability to handle fast motion and partial occlusion, whereas large context area leads the CF to suffer from boundary effect. To make use of a large area of context and alleviate the accompanying drift risk, we propose a mask-constrained context correlation filter for object tracking. We first analyze the traditional window strategy via Taylor series and design a spatial mask that can be covered by a larger context area. Furthermore, the shape of the mask is adaptive to the target variation. Extensive experimental results in OTB-2015, VOT-2014, and VOT-2016 datasets demonstrate that this mask-constrained operation can improve the CF tracker performance in a large margin.
Discriminative correlation filters (DCFs) have shown excellent performance in visual tracking. DCF substitutes the sliding windows sampling strategy in traditional tracking methods with circular shift of the context area. Via projecting the filter learning into the frequency domain, DCF achieves satisfying performance and speed. Appropriate context area size has an influence on the performance of correlation filters. Small context area limits the CF’s ability to handle fast motion and partial occlusion, whereas large context area leads the CF to suffer from boundary effect. To make use of a large area of context and alleviate the accompanying drift risk, we propose a mask-constrained context correlation filter for object tracking. We first analyze the traditional window strategy via Taylor series and design a spatial mask that can be covered by a larger context area. Furthermore, the shape of the mask is adaptive to the target variation. Extensive experimental results in OTB-2015, VOT-2014, and VOT-2016 datasets demonstrate that this mask-constrained operation can improve the CF tracker performance in a large margin.
Harris corner detection in checkerboard images for camera calibration often suffer from uneven illumination. The key to camera calibration lies in that how to robustly detect corners from the degraded images. To this end, an image processing method is proposed to deal with non-uniform illumination problems. Experiments show that the stabilities of Harris corner detection under uneven illumination are improved obviously.
Camera calibration is a key step in three-dimensional (3-D) reconstruction; however, the calibration accuracy and stability often suffer from complicated illumination in real applications. The camera calibration method using the checkerboard pattern is improved for high-precision calibration under complicated illumination. First, an improved Harris corner detector based on color constancy, as well as a subpixel optimization method based on prior knowledge of the checkerboard, are proposed to overcome the influence of bad lighting. Second, a checkerboard pattern with a central circle point is designed for better reference point location for uneven illumination. The checkerboard identification by Delaunay triangulation is also improved for the situation of discontinuity caused by over-saturation. Finally, the camera parameter optimization process is improved to reduce the calibration error produced by complicated illumination. Experimental results show that the improvements achieve more accurate and stable calibration results in complicated lighting conditions compared with traditional methods.
The phase-shifting algorithm is widely used for noncontact three-dimensional (3-D) reconstruction, traditionally leveraging gray values to recover phase angles. It is difficult to reconstruct the shape of an object with a specular or dark surface. We propose an algorithm for accurate phase recovery based on high dynamic range imaging. Unlike most existing methods that use gray values to calculate the phase angle, the proposed method uses E values, which are the recovered irradiance from low dynamic range images, to calculate the phase angle. Experiments show that the proposed method improves phase recovery accuracy and achieves good results for the 3-D reconstruction of specular or dark objects.
Saliency detection has been applied to the target acquisition case. This paper proposes a two-dimensional hidden Markov model (2D-HMM) that exploits the hidden semantic information of an image to detect its salient regions. A spatial pyramid histogram of oriented gradient descriptors is used to extract features. After encoding the image by a learned dictionary, the 2D-Viterbi algorithm is applied to infer the saliency map. This model can predict fixation of the targets and further creates robust and effective depictions of the targets’ change in posture and viewpoint. To validate the model with a human visual search mechanism, two eyetrack experiments are employed to train our model directly from eye movement data. The results show that our model achieves better performance than visual attention. Moreover, it indicates the plausibility of utilizing visual track data to identify targets.
Inspired by unsupervised feature learning (UFL) within the self-taught learning framework, we propose a method based on UFL, convolution representation, and part-based dimensionality reduction to handle facial age and gender classification, which are two challenging problems under unconstrained circumstances. First, UFL is introduced to learn selective receptive fields (filters) automatically by applying whitening transformation and spherical k-means on random patches collected from unlabeled data. The learning process is fast and has no hyperparameters to tune. Then, the input image is convolved with these filters to obtain filtering responses on which local contrast normalization is applied. Average pooling and feature concatenation are then used to form global face representation. Finally, linear discriminant analysis with part-based strategy is presented to reduce the dimensions of the global representation and to improve classification performances further. Experiments on three challenging databases, namely, Labeled faces in the wild, Gallagher group photos, and Adience, demonstrate the effectiveness of the proposed method relative to that of state-of-the-art approaches.
We propose a local feature representation based on two types of linear filtering, feature pooling, and nonlinear divisive normalization for remote sensing image classification. First, images are decomposed using a bank of log-Gabor and Gaussian derivative filters to obtain filtering responses that are robust to changes in various lighting conditions. Second, the filtering responses computed using the same filter at nearby locations are pooled together to enhance position invariance and compact representation. Third, divisive normalization with channel-wise strategy, in which each pooled feature is divided by a common factor plus the sum of the neighboring features to reduce dependencies among nearby locations, is introduced to extract divisive normalization features (DNFs). Power-law transformation and principal component analysis are applied to make DNF significantly distinguishable, followed by feature fusion to enhance local description. Finally, feature encoding is used to aggregate DNFs into a global representation. Experiments on 21-class land use and 19-class satellite scene datasets demonstrate the effectiveness of the channel-wise divisive normalization compared with standard normalization across channels and the fusion of the two types of linear filtering in improving classification accuracy. The experiments also illustrate that the proposed method is competitive with state-of-the-art approaches.
Texture information plays an important role in rendering true objects, especially with the wide application of image-based three-dimensional (3-D) reconstruction and 3-D laser scanning. This paper proposes a seamless texture mapping algorithm to achieve a high-quality visual effect for 3-D reconstruction. At first, a series of image sets is produced by analyzing the visibility of triangular facets, the image sets are clustered and segmented into a number of optimal reference texture patches. Second, the generated texture patches are sequenced to create a rough texture map, then a weighting process is adopted to reduce the color discrepancies between adjacent patches. Finally, a multiresolution decomposition and fusion technique is used to generate the transition section and eliminate the boundary effect. Experiments show that the proposed algorithm is effective and practical for obtaining high-quality 3-D texture mapping for 3-D reconstruction. Compared with traditional methods, it maintains the texture clarity while eliminating the color seams, in addition, it also supports 3-D texture mapping for big data application.
A novel shadow detection method for color remotely sensed images that satisfies requirements for both high accuracy and wide adaptability in applications is presented. This method builds on previously reported work investigating the shadow properties in both red/green/blue (RGB) and hue saturation value (HSV) color spaces. The method integrates several shadow features for modeling and uses a region growing (RG) algorithm and a perception machine (PM) of a neural network (NN) to identify shadows. To ensure efficiency of the parameters, first the proposed method uses a small number of shadow samples manually obtained from an input image to automatically estimate the necessary parameters. Then, the method uses the estimated threshold to binarize the hue map of the input image for obtaining possible shadow seeds and applies the RG algorithm to produce a candidate shadow map from the intensity channel. Subsequently, all of the hue, saturation, and intensity maps from the candidate shadow map are filtered with a corresponding band-pass filter, and the filtered results are input into the PM algorithm for the final shadow segmentation. Experiments indicate that the proposed algorithm has better performance in multiple cases, providing a new and practical shadow detection method.
Detecting a vehicle to obtain traffic information at nighttime is difficult. This study proposes a vehicle detection algorithm, called the headlight extraction, pairing, and tracking (HLEPT) algorithm, which can acquire traffic information in the rain at nighttime by identifying vehicles through the location of their headlights and other indicative lights. A knowledge-based connected-component procedure, in which vehicles are located by grouping their lights and estimating their features, is proposed. The features of a complex nighttime traffic scene were also analyzed. The HLEPT algorithm includes a headlight extraction algorithm, as well as regulations for the pairing and grouping of lights and light tracking using a Kanade-Lucas-Tomasi tracker to measure traffic flow and velocity. Experimental results demonstrate the feasibility and effectiveness of the proposed approach on vehicle detection in the rain at nighttime.
The histogram of oriented gradients has been proven to be a successful method of object detection, especially for pedestrian detection in images and videos. However, the question of how to make maximal use of color information for gradient calculation has not been thoroughly investigated. We propose a simple yet effective adaption that uses a combination of grayscale-based gradient and color-invariant-based gradients (after Geusebroek et al.) to replace the original gradient definition. Our experiments show that such a combination achieves a 30% reduction in miss rate, using the same experiment setting and the same evaluation criteria as Dalal et al. We have also measured the trade-off between the performance and computational cost by using a more sophisticated quadratic kernel instead of a linear kernel. While it can reduce the miss rate further by 10% to 20%, using a quadratic kernel can take as much as 70 times more running time for the original (Dalal et al. 2006) dataset.
In Itti's model, which was one of the representative saliency models proposed in 1998, a Gaussian pyramid is used to analyze color information in scene images, and to generate a color conspicuity map. In this conspicuity map, some important objects can be located by salient areas, but their contours cannot be described clearly and perfectly. In this work, a wavelet low-pass pyramid is used to generate a color conspicuity map, and the contours of important objects pop out perfectly from salient areas. Experimental results validate the superiority of the proposed method.
In this paper, we propose a novel learning algorithm, the nearest reduced convex hull (NRCH), to tackle the issue of limited training information in practical remote sensing. Before classification, the "underlying" prototypes in the training subspace of each class are approximated by constrained convex combinations of the "existing" ones of this class. By this means, the original training set of each class is eventually expanded to a reduced convex hull (RCH) manifold through which the representational capacity of the training set is greatly enlarged. During this process, good separations of different classes are well maintained by the reduced factor. Based on these RCHs, the nearest neighbor decision rule is then utilized to classify a query sample. Experimental results, obtained on different kinds of data (synthetic data and real multispectral images), show the potential of NRCH for remote sensing classification in comparison with some famous traditional classifiers including Maximum Likelihood Classifier (MLC), Back-propagation Neural Network Classifier (BP), and Support Vector Machine (SVM).
During the past years, it is well known that data mining has made a lot of achievement in business-typed fields. In this case, the research emphasis on data mining and knowledge discovery has inevitably be shifted from non-spatial data into spatial data. Spatial data mining is is a key technology of "3S", as well as important research content of "Digital Earth". The first thing come into contact with is to decide tuples to be dealed with while carrying out algorithms concerned. In this paper, a new kind of tuple, which is named "mesh-shaped" tuple, is proposed. This kind of tuple takes on larruping characters, it has fields similar to spatial entity and attributes similar to pixel of images. To test feasibility of the new tuple, a practical study case is designed. The whole process of operation demonstrates how to use "mesh-shaped" tuple, and the results indicates that our technical notion is feasible and successful. The new tuple provides a novel technical means to spatial data, and this performance enrichs approach of spatial data mining. The research notion in this paper has shown its significance in science and applications.
Spectral similarity measure plays important roles in hyperspectral Remote Sensing (RS) information processing, and it can be used to content-based hyperspectral RSimage retrieval effectively too. The applications of spectral features to Remote Sensing (RS) image retrieval are discussed by taking hyperspectral RS image as examples oriented to the demands of massive information management. It is proposed that spectral features-based image retrieval includes two modes: retrieval based on point template and facial template. Point template is used usually, for example, a spectral curve, or a pixel vector in hyperspectral RS image. One or more regions (or blocks with area shape) are given as examples in image retrieval based on facial template. The most important issues in image retrieval are spectral features extraction and spectral similarity measure. Spectral vector can be used to retrieval directly, and spectral angle and spectral information divergence (SID) are more effective than Euclidean distance and correlation coefficient in similarity measure and image retrieval. Both point and pure area template can be transformed into spectral vector and used to spectral similarity measure. In addition, the local maximum and minimum in reflection spectral curve, corresponding to reflection peak and absorption valley, can be used to retrieval also. The width, height, symmetry and power of each peak or valley can be used to encode spectral features. By comparison to three approaches for spectral absorption and reflection features matching and similarity measures, it is found that spectral absorption and reflection features are not very effective in hyperspectral RS image retrieval. Finally, a prototype system is designed, and it proves that the hyperspectral RS image retrieval based on spectral similarity measure proposed in this paper is effective and some similarity measure index including spectral angle, SID and encoding measure are suitable for image retrieval in practice.
KEYWORDS: Data mining, Monte Carlo methods, Expectation maximization algorithms, Mining, Computer simulations, Databases, Data modeling, Fuzzy logic, Information science, Information technology
On the basis of analyzing the uncertainties of spatial data mining (SDM), and in view of the limits of traditional spatial data mining, the framework for the uncertain spatial data mining has been founded. For which, four key problems have been probed and analyzed, including uncertainty simulation of spatial data with Monte Carlo method, measurement of spatial autocorrelation based on uncertain spatial positional data, discretization of continuous data based on neighberhood EM algorithm and quality assessment of results. Meanwhile, the experiments concerned have been performed using the geo-spatial datum gotten from 37 typified cites in China.
The feature contrast model (FCM), which is the simplest form of the matching function in Tversky's set-theoretic similarity, is a famous similarity model in psychological society. Although FCM can be employed to explain the similarity with both semantic and perceptual features, it is very difficult for FCM to measure natural image similarity with semantic features because of the requirement that all features must be binary and the complex mechanism that semantic features are transformed into binary features. The fuzzy feature contrast model (FFCM) is an extension of FCM, which replaces the complex feature representation mechanism with a proper fuzzy membership function. By this fuzzy logic, visual features, in the FFCM, can be represented as multidimensional points instead of expansible feature set and used to measure visual similarity between two images. Based on the analysis of the distinction between two feature structures (i.e., the expansible feature set and multidimensional vector), we propose a ratio model, which expresses similarity between two images as a ratio of the measures of semantic features set to that of multidimensional visual features. Experiments results, over real-world image collections, show that our model addresses the distinction between semantic and visual feature structures to some extension. In particular, our model is suit for the case that semantic features are implicitly obtained from interaction with users and the visual features are transparent for users, for example, the relevance feedback in interactive image retrieval.
Landsat TM image is the most popular and universal RS information source, and got wide uses in different fields such as resource investigating, environment monitoring, urban planning, disaster preventing and as on. Although TM image has got wide applications, its use in mining area is still in experiment and beginning stage because mining area is a kind of special and complex geographic region. One of the most important issues is to study the information charaéteristics and determine the most effective band combinations oriented to given region and task. In this paper, Xuzhou mining area, located in Northern Jiangsu Province, is taken as the studying area, and Terrestrial Surface Evolution (TSE) as the studying task. According to the specific condition of studying area, the information characteristics of each band of TM image and relations between different bands are analyzed by selecting different sampling area, and relative rules are given. After that, band combination is discussed and the information content is used as the judging rule. Because more bands will require more computer resource and is low speed and cost consuming, three-band combination is used widely. It is found that in all three-band combination schemes, the combination of Band 3, Band 4 and Band 5 is the most effective. Finally, Genetic Algorithm (GA) is used to the band selection in multi-band RS image, and it proved that GA is an effective method to determine the optimal band combination, especially for multi-spectrum and super-spectrum RS information source, and GA is also a good optimized algorithm in Geoscience.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.