Object detection is a technique used to localize and classify objects in an image or a video sequence. It is an emerging topic of research in the field of computer vision. However, detections in a video are affected by sensor-specific challenges. A convolutional neural network-based You Only Look Once, version 3 (YOLOv3) object detection algorithm was used to get optimized computation time and accuracy. In the proposed methodology, the YOLOv3 architecture extracted significant features from both the visible and thermal imaging domains, and an adaptive fusion of both domains was performed to determine the dominant imaging domain and provide robust detections. The resulting YOLOv3 detections included the bounding box coordinates, confidence score, and class from each imaging domain, which were fused implicitly into a single plane. The sensor domain having the maximum number of object detections was chosen as the reference to be compared with the other domain for the adaptive fusion process. After fusion, the algorithm removed redundancy using adaptive intersection over union thresholding. The mean average precision result obtained from the fusion algorithm was 44.25%. A comparative study was also carried out between pre-trained common objects in context weights and custom CAMEL dataset weights; it showed the significance of using adaptive fusion in challenging situations such as nighttime, shadow, varying illumination, moving camera, and crowd.
Object tracking is a technique used in computer vision and image processing applications. This technique fails to track objects when using only visible sensors in situations with illumination variations, occlusion, camouflage, and adverse climatic circumstances. Thermal sensors are fairly illumination invariant. However, in situations where the temperature is not changing much between the foreground and the background, thermal sensors might not provide the best results, as everything might be single color coded. Visible sensors are known to perform better in these conditions. Thus, the joint use of the visible and thermal sensors would be most beneficial in making a robust object tracking system under such challenging conditions. The proposed method performs object tracking by fusing bimodal information using particle filter with structural similarity index metric (SSIM) as a cue as well as a correlation metric for modality selection. The key idea of this method is to adaptively choose the most discriminating modality with respect to the object being tracked. This is performed by calculation of SSIM between reference and successive frames of both the modalities. The method is evaluated by testing on a variety of extremely challenging video sequences in both imaging modalities and has proven to perform better compared to single modality.
Object tracking plays a vital role in many computer vision systems applications, such as video surveillance, robotics, 3-D image reconstruction, medical imaging, human computer interface, etc. In many proposed approaches, feature-based object tracking is widely used due to its accuracy. Feature extraction and feature correspondence are two main components of feature-based object tracking. In our proposed method, we have used gray level co-occurrence matrix (GLCM) features for object tracking in thermal imagery. As spatial resolution of the thermal sensor is fairly coarse, it also implies that temperature scales are close but may not exactly be the same, which indicates the presence of mutually related pixels or group of pixels. The GLCM texture analysis is based on assumption that the texture information of an image is an average spatial relationship between the gray tones in the image. Thus, this similarity in spatial resolution properties makes GLCM features suitable for object tracking in thermal infrared imagery. Initially, the target blobs to be tracked are provided by object detection stage. Then, for a given target blob in a frame, we first calculate GLCM feature points and then find corresponding features in the next successive frame. The sum of squared differences between two feature point sets is calculated to find feature correspondence between two frames for object tracking. Simultaneously, the codebook of the center of the blobs for prediction of the target candidate region in the next frame is maintained in order to have robust object tracking under occlusions. GLCM-based object tracking in thermal imagery outperforms the color or LBP-based mean-shift approach. This algorithm is also able to track objects in split and merge condition. The accuracy of this algorithm also depends on the object detection stage, such as Kalman tracking.
Moving object detection is one of the most promising research areas, which is required in different applications, such as video monitoring and surveillance systems, human activity recognition systems, vehicle counting, and anomaly detection. Various methods for object detection using single sensor and a few using multimodal techniques have been reported in the literature. However, such systems fail to handle adverse or challenging atmospheric conditions such as illumination variations, scale and appearance change of objects or targets, occlusions, and camouflaged conditions. We have presented an approach for the detection of moving objects using structural similarity metric (SSIM) and Gaussian mixture model (GMM). SSIM is used to compute similarity between reference mean background frame and foreground frame of visible spectrum (VIS) and thermal infrared (IR) independently. The computation of similarity measure is performed in an image spatial domain. The threshold results of SSIM are fused together using different pixel-level fusion methods such logical “OR,” discrete wavelet transform, and principal components analysis. Temporal analysis is performed to eliminate noise and false positives (unwanted background regions) using GMM on fused results. We have compared the results with recent methods for different complex scenarios and found out that approximately F-measure increases up to 80%. Hence, the proposed method proves to be a robust moving object detection technique in multimodality domain.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.