PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 11395, including the Title Page, Copyright information and Table of Contents
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Optimum sparse arrays and sensor placement is an effective tool for improving target detection and estimation capabilities, which is specifically true in the case of the limited system resources. In this paper, we propose a novel approach for maximizing the Signal-to-Interference plus noise ratio (MaxSINR) for receive beamforming applications in a computationally efficient manner. The proposed approach blends the concept of sparse array windowing function and the DFT approach, where the sparse array design is primarily conceived in the transformed domain. We show that the proposed objective function in the transform domain sorts the favorable sparse array configurations w.r.t. the average SINR performance. The proposed approach, therefore, facilitates the implementation of greedy sensor placement approach that permits sequential sensor selection. The transformed domain design is coupled with the sequential sensor placement allowing an efficient implementation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Over the last decades, criminal activities have progressively expanded into the information technology (IT) world, adding to the “traditional” criminal activities, ignoring political boundaries and legal jurisdictions. Building upon the possibilities of technologies like Big Data analytics, representational models, machine learning, semantic reasoning and augmented intelligence, our work presented in this paper, which has been performed within the collaborative research project MAGNETO (Technologies for prevention, investigation, and mitigation in the context of the fight against crime and terrorism), co-funded by the European Commission within Horizon 2020 programme, is going to support LEAs in their critical need to exploit all available resources and handling the large amount of diversified media modalities to effectively carry out criminal investigation. The paper at hand focuses at the application of machine learning solutions for information fusion and classification tools intended to support LEA’s investigations. The Person Fusion Tool will be responsible for finding in an underlying knowledge graph different person instances that refer to the same person and fuse these instances. The general approach, the similarity metrics, the architecture of the tool and design choices as well as measures to improve the efficiency of the tool will be presented. The tool for classifying money transfer transactions uses decision trees. This is due to a requirement of easy explainability of the classification results, which is demanded from the ethical and legal perspective of the MAGNETO project. The design of the tool, the selected implementation and an evaluation based on anonymized financial data records will be presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Three-dimensional (3-D) radar imaging is an important task underpinning many applications that seek target detection, localization, and classification. However, traditional 3-D radar imaging is computationally cumbersome and needs a large memory space to store the cubic data matrix. These challenges have hindered 3-D radar imaging development and limited its applications. Owing to the sparsity of the radar image of typical targets, the data size could be significantly reduced by exploiting compressive sensing and sparse reconstruction techniques. These techniques prove important to mitigate the sidelobe levels that arise from successive Fourier-based processing. Towards this end, we first perform polar reformatting to the 2-D data matrix in azimuth and range. Then, a series of images over different values of the vertical variable z are generated by using different focusing filters. Sparse optimization is afterwards applied to the 3-D data cube to produce high-resolution, significantly reduced sidelobe 3-D image. Compared with the conventional 3-D radar imaging methods, the proposed method, in addition to high fidelity images, requires fewer data measurements and offers computationally efficient processing. Numerical simulations are provided to evaluate the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Target detection is an important problem in remote-sensing with crucial applications in law-enforcement, military and security surveillance, search-and-rescue operations, and air traffic control, among others. Owing to the recently increased availability of computational resources, deep-learning based methods have demonstrated state-of- the-art performance in target detection from unimodal aerial imagery. In addition, owing to the availability of remote-sensing data from various imaging modalities, such as RGB, infrared, hyper-spectral, multi-spectral, synthetic aperture radar, and lidar, researchers have focused on leveraging the complementary information offered by these various modalities. Over the past few years, deep-learning methods have demonstrated enhanced performance using multi-modal data. In this work, we propose a method for vehicle detection from multi-modal aerial imagery, by means of a modified YOLOv3 deep neural network that conducts mid-level fusion. To the best of our knowledge, the proposed mid-level fusion architecture is the first of its kind to be used for vehicle detection from multi-modal aerial imagery using a hierarchical object detection network. Our experimental studies corroborate the advantages of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
While machine learning-based image restoration techniques have been the focus in recent years, these algorithms are not adequate to address the effects of a degraded visual environment. An algorithm that successfully mitigates these issues is proposed. The algorithm is built upon the state-of-the-art DeblurGAN algorithm but overcomes several of its deficiencies. The key contributions of the proposed techniques include: 1)Development of an effective framework to generate training datasets typical of a degraded visual environment; 2) Adopting a correntropy based loss function to integrate with the original VGG16 based perceptual loss function and an L1 loss function; 3) Conducting substantial experiments against images from the artificial training datasets and demonstrate the effectiveness of the proposed algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Single image super-resolution (SR) refers to the task of obtaining a high resolution (HR) image from a low resolution (LR) image. During the last two decades, numerous techniques that infer the missing HR data from a LR image have been developed. Recently powerful deep learning algorithms have been employed for this task and achieved the state-of-the-art performance. On the other hand, during the last decade, several techniques have been developed that generate a HR image from a compressively sensed exposure. In this paper we discuss the differences between these two types of SR imaging approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recent advances in deep learning have achieved great success in fundamental computer vision tasks such as classification, detection and segmentation. Nevertheless, the research effort in deep learning-based video coding is still in its infancy. State-of-the-art deep video coding networks explore temporal correlations by means of frame-level motion estimation and motion compensation, which require high computational complexity due to the frame size, while existing block-level interframe prediction schemes utilize only the co-located blocks in preceding frames, which did not consider object motions. In this work, we propose a novel motion-aware deep video coding network, in which inter-frame correlations are effectively explored via a block-level motion compensation network. Experimental results demonstrate that the proposed inter-frame deep video coding model significantly improves the decoding quality under the same compression ratio.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A critical limitation in the application of deep learning to radar signal classification is the lack of sufficient data to train very deep neural networks. The depth of a neural network is one of the more significant network parameters that affects achievable classification accuracy. One way to overcome this challenge is to generate synthetic samples for training deep neural networks (DNNs). In prior work of the authors, two methods have been developed: 1) diversified micro-Doppler signature generation via transformations of the underlying skeletal model derived from video motion capture (MOCAP) data, and 2) auxiliary conditional generative adversarial networks (ACGANs) with kinematic sifting. While diversified MOCAP has the advantage of greater accuracy in generating signatures that span to the probable target space of expected human motion for different body sizes, speeds, and individualized gait, the method cannot capture data artifacts due to sensor imperfections or clutter. In contrast, adversarial learning has been shown to be able to capture non-target related artifacts, however, the ACGANs can also generate misleading signatures that are kinematically impossible. This paper provides an in-depth performance comparison of the two methods on a through-the-wall radar data set of human activities of daily living (ADL) in the presence of clutter and sensor artifacts.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We consider the problem of detecting a change in an arbitrary vector process by examining the evolution of calculated data subspaces. In our developments, both the data subspaces and the change identification criterion are novel and founded in the theory of L1-norm principal-component analysis (PCA). The outcome is highly accurate, rapid detection of change in streaming data that vastly outperforms conventional eigenvector subspace methods (L2-norm PCA). In this paper, illustrations are offered in the context of artificial data and real electroencephalography (EEG) and electromyography (EMG) data sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The problem of accurately inferring the firing time of neurons from fluorescent microscopy measurements such as calcium imaging is crucial for deciphering the dynamics of neural spiking activity during different tasks. The fluorescence signals from calcium images can be modeled as the convolution of the neural spikes (action potential) with an exponentially decaying kernel whose decay is governed by the type of calcium indicator used. Calcium signals exhibit excellent spatial resolution and it is possible to record individual neural activity from a large population of neurons. However, the main drawback of calcium imaging is that it has a poor temporal resolution due to the slow dynamics of calcium indicators and scanning limitations of existing microscopes. Existing spike deconvolution algorithms obtain a representation of spiking activity at a rate that is identical to the acquisition rate of calcium signals (typically <60Hz). However, this does not accurately capture the true spiking activity as typical neural spike separation could be <<5ms. In this paper, we show that simultaneously using the measurements from multiple neurons can be combined with accurate modeling of spiking activity to overcome these limitations.
The main idea is to utilize the inherent multichannel structure of the problem. Calcium traces from different neurons will be considered as the output of the same unknown filter excited by different inputs corresponding to the spiking activity of different neurons. We will develop a sparse reconstruction algorithm that can solve this multichannel blind deconvolution problem from subsampled measurements and simultaneously recover the sparse neural activity at a rate that is representative of the true neural activity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Analyzing the breathing data to classify the respiratory states has various applications in the areas of drug overdose and medical diagnosis of several other respiratory medical conditions. Tracheal sounds have shown to provide accurate breathing data with high signal to noise ratio for measuring air flow. The heartbeat signal is a source of interference for the tracheal sound measurements and hence is often filtered out prior to the tracheal sound data analysis. Filtering the heartbeat signal, however, removes a part of the tracheal sound data along with its energy and statistical information. We propose an algorithm to classify the respiratory states from the tracheal sound data despite the presence of heartbeat signals. This algorithm uses the data histogram as well as the data autocorrelation function (ACF) for feature extractions, and shows that these features, when used by a Softmax classifier, can properly discriminate among breathing conditions with classification rates exceeding 97%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Stroke is a leading cause of long-term disability in survivors, imposing functional limitations such as mobility impairments, speaking, and understanding, as well as paralysis. The outcomes of stroke on function and mobility may vary from complete paralysis of one side of the body to one-sided weakness of the body. This forces individuals to use multiple types of assistive technologies (AT) for mobility and balance. The use of AT combined with variations in functional recovery post-stroke create hard to detect complex mobility modes and patterns. Existing clinic- and community-based post-stroke rehab interventions rely on measurements of physical activity, rehab, and health outcomes using validated clinical tools, such as questionnaires and self-reports. These tools, however, suffer from participant bias, recall bias, and social acceptability bias. To address some of the limitations of self-report, research in use of body sensors for detecting and quantifying mobility in individuals with stroke has gained increased interest. In this paper, we consider a body-plus-assistive-device based network and identify dominant sensors for classification of complex mobility modes, such as walking with a cane or a walker, or other mobility activity, influenced by functional limitations and AT usage.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Gene microarray data generally includes high dimension, small sample datasets prone to noise. Analyzing this data using supervised and non-supervised learning algorithms is extremely useful for gene characterization, disease diagnosis, and genetic therapy in the medical field. For many years, principal component analysis (PCA) has been used as a tool in algorithms for gene expression classification. Previous solutions utilize L2 norm based PCA, however with its superior resistance to outlier data, L1 norm PCA offers improved results. Both methods are compared using support vector machines (SVM) to classify genetic mutations and co-regulation in several publicly available datasets. Methods utilizing L1 PCA result in improved accuracy compared to L2 PCA when used as a pre-processing step to SVM classification for gene microarray data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Obtaining the effective contour from an image taken by fisheye lens is important for the following transactions. Many studies try to develop suitable methods to get accurate contours of fisheye images. Using the traditional level set method (CV model) is hard to meet the desire task that the final segmentation region is a circle. Therefore, the preprocessing of fisheye images and the improvement of traditional level set method are redesigned to get a final circular segmentation which may be suitable to other applications. In this paper, we use the local entropy method to make the value of pixels be even inside the effective circular region, further threshold method to remove the hole(s), and at last the explicit circular level set method to get final segmentation. The final experimental results show that the segmentation is effective.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Convolutional neural networks (CNNs) become very useful tools in classification and feature extraction applications. In this research, we present a comparable study of several commonly-used CNNs in terms of performance. Most recently developed CNNs are selected in our study, which include NASNet-Large, Inception-Resnet-v2, DenseNet201, NASNet- Mobile, MovileNet-v2 as well as well-known ResNet50 and VGG19 for comparisons. In our classification experiments there are eight different geometrical shapes, each of which includes 486 to 620 computer-generated images. Two basic shapes, triangle and square, vary with solid or hollow shapes, and then overlapping with or without three-disk distractors. CNNs training and testing both can use the shape images as the experiments conducted on the ImageNet. On the other hand, we can use the pretrained CNNs on ImageNet to extract features, then train a multiclass support vector machine (SVM) to do classification. Training images may include four shapes or two categories (solid or hollow), while testing images are four shapes or two categories with distractors. The performance of CNNs includes classification accuracies and time costs in training and testing. The experimental results will provide guidance in selecting CNN models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A great amount of work has been done using deep learning strategies in sequence modeling, such as the encoder-decoder LSTM. Most solutions for this type of problem involve recurrent neural networks. This approach, while often yielding superior results, has its downsides. Training time can sometimes be prohibitively long and the only remedy for that is higher computing power. Additionally, these models are extremely difficult to interpret, regardless of their performance. The deeper and more complex the network, the harder it is to make sense of it. Neural networks lack a simple representation of the knowledge they learn. Rule-based learners, however, are the opposite in this regard. They represent the knowledge they learn through relational rules, which are easily digested by a human. While most rule-based learners are designed with the intent of discovering dependencies between variables, such as in association rule learning, sometimes rule-based learners can be implemented in a supervised setting. For sequence modeling, though, rule-based solutions are scarce, especially if we consider the case where inputs and targets are variable length sequences. We propose a type of architecture that utilizes binary trees and evolutionary algorithms to discover rules. These models make inferences through sequential actions, similarly to how reinforcement learning agents progress through tasks by choosing actions based on the state of the environment. These new learners can predict on various data types, including multi-output and sequences of variable length. We consider a simple problem where the inputs and targets are variable length sequences. Our strategy perfectly learns the rules from a synthetic dataset. Lastly, we discuss how to apply the strategy more generally.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Big data has been driving professional sports over the last decade. In our data-driven world, it becomes important to find additional methods for the analysis of both games and athletes. There is an abundance of videos taken in professional and amateur sports. Player datasets can be created utilizing computer vision techniques. We propose a novel approach by creating an autonomous masking algorithm that can receive live or previously recorded video footage of sporting events. This procedure can identify graphical overlays to optimize further processing by tracking and text recognition algorithms for real-time analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The cost of data-movement is one of the fundamental issues with modern compute systems processing Big Data workloads. One approach to move the computation closer to data is to equip the storage or memory devices with processing power. The notion of moving computation to data is known as Near Data Processing (NDP). In this work, we re-examine the idea of reducing the data movement by processing data directly in the storage devices. We evaluate ASTOR, a compute framework on an Active Storage platform, which incorporates a software stack and a dedicated multi-core processor for in-storage processing. ASTOR utilizes the processing power of storage devices by using an array of Active Drive™ devices to significantly reduce the bandwidth requirement on the network. We evaluate the performance and scalability of ASTOR for distributed processing of Big Data workloads. We conclude by discussing a comparative study of other existing data-centric approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper aims to develop a robust dissolved oxygen (DO) prediction model of water quality to support the Hybrid Aerial Underwater Robotics System (HAUCS) project. Many challenges arise in developing such a model using the fish farm data collected, such as a small dataset containing missing data and noisy measurements taken in an irregular interval. An attempt to deal with these issues to obtain a robust prediction is discussed. Machine learning techniques, such as Long Short-Term Memory (LSTM) and Phased LSTM (PLSTM), are presented and motivated for dealing with the problem. The performances of LSTM and PLSTM against a larger and less problematic water quality dataset are first investigated. The attempts to transfer the knowledge of the models trained on this large dataset for fish farm DO data prediction through Transfer Learning are then reported. To mitigate the noisy measurement data, a loss function which can better deal with Gaussian noise: the correntropy loss is adopted. The long-range prediction experimental results using this Transfer Learning technique and the correntropy loss function are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The extraction of useful information from large, disparate, and heterogeneous data sets requires a good set of theoretical and computational tools. The methods based on the ideas of Information Geometry (IG) offer an understanding of the hidden patterns inherent in the data as well as help in their visualization. Fisher Information is such a tool. It has been used widely in many areas of social and economic research leading to an improved understand of trends and hidden patterns. Here we outline its usefulness in understanding large and disparate data sets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The U.S. Food and Drug Administration (FDA) has approved two digital pathology systems for primary diagnosis. These systems produce and consume whole slide images (WSIs) constructed from glass slides using advanced digital slide scanners. WSIs can greatly improve the work ow of pathologists through the development of novel image analytics software for automatic detection of cellular and morphological features and disease diagnosis using histopathology slides. However, the gigabyte size of a WSI poses a serious challenge for storage and retrieval of millions of WSIs. In this paper, we propose a system for scalable storage of WSIs and fast retrieval of image tiles using DRAM. A WSI is partitioned into tiles and sub-tiles using a combination of a space-filling curve, recursive partitioning, and Dewey numbering. They are then stored as a collection of key-value pairs in DRAM. During retrieval, a tile is fetched using key-value lookups from DRAM. Through performance evaluation on a 24-node cluster using 100 WSIs, we observed that, compared to Apache Spark, our system was three times faster to store the 100 WSIs and 1,000 times faster to access a single tile achieving millisecond latency. Such fast access to tiles is highly desirable when developing deep learning-based image analytics solutions on millions of WSIs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.