PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Until recently, evaluating the quality of unsupervised learning was too slow and expensive. This was a major hurdle to edge-enabled AI and any situations for which computational expense is a significant requirement. Adaptive Resonance Theory has been part of the solution because it can self-correct based on unsupervised category mismatch detection and reset. This advantage can be further leveraged by the development of incremental cluster validity indices. Validity indices provide various quality measures for unsupervised learning. Converting these to incremental versions is an approach that dominates prior methods, particularly for real-time or edge computing applications. Integrating incremental measures into the machine learning architecture further enhances these cost and speed advantages.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present results comparing black-box and physics-guided neural network architectures for hyperspectral target identification. Specifically, our physics-guided neural networks operate on at-sensor overhead long-wave infrared hyperspectral imaging radiances to predict not only the material class, but also physically-meaningful quantities of interest, such as the atmospheric transmission factor, the temperature, and the underlying material emissivity. In this way, our models are decoupled from traditional preprocessing routines and provide independently verifiable and interpretable quantities alongside the class predictions. We compare our physics-guided models to more traditional black-box models with respect to classification accuracy and representational similarity, and assess performance in predicting physical quantities across a variety of training schemes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have developed a system that applies deep learning application super-resolution (SR) to multispectral and hyperspectral geospatial satellite imagery to deduce higher resolution images from lower resolution images while maintaining the original color of the lower resolution pixels. A super-resolution model, which uses Deep Convolution Neural Networks (DCNNs), is trained using individual image bands, a large crop size or tile size of 512 × 512 pixels, and a de-noise algorithm. Applying our algorithms to maintain the original color of the image bands improves the quality metrics of the super-resolution images as measured by peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) of super-resolution images. One of the most important applications of satellite images is to automatically detect small objects such as vehicles and small boats. With super-resolution images generated by our system, the object detection accuracy (recall and precision) has improved by 20% with Planet® multispectral satellite images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The preservation of Northern elephant seals, with a current population exceeding 250,000, has been due to successful conservation efforts. Down to as few as 100 seals in the 1890s, accurate population monitoring remains crucial. Counting seals from the ground, especially on remote islands where most breed, is difficult and dangerous, and manually counting from aerial photos is time-consuming and error-prone. This research proposes an automated method of counting elephant seals using machine learning. Drone images were collected from A˜no Nuevo Reserve, California, US, during the 2022 and 2023 winter breeding seasons. The system automatically created orthophotos from drone images, made predictions using a single-stage object detection model from tiles, detected and removed duplicate predictions, classified the seals into males, females, and pups, and mapped the predictions back to the orthophotos as labeled bounding boxes. An optional active learning component also allowed human reviewers to make corrections in a UI and edits could be automatically turned into new training data to improve future surveys. In an examination of the largest aggregation on the Mainland, the model found 99.4% of females, 97.8% of males, and 97.0% of pups. The whole pipeline, including model training, can be run on a laptop, and it can be utilized in remote field sites where there is no internet access.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Longwave Infrared hyperspectral images (LWIR HSI) are a powerful data source for various applications in national security and environmental monitoring. A promising area for applying machine learning to LWIR HSI data is for gas plume identification from remote sensing platforms. However, a significant practical difficulty in using HSI for this task is the ability to estimate and remove the background spectra underlying a detected gas plume. Typically, one estimates a covariance matrix and a mean spectrum using all pixels from an image to whiten the pixels of interest before substance identification. We propose using image segmentation to define local regions to perform this whitening. We investigate both local and global estimation of the covariance and mean spectrum, and find that using the global covariance and local mean increases prediction confidence using our deep learning classification model. Using an airborne LWIR capture of the Los Angeles basin, we investigate performance increases by generating an ensemble of random marker-based Watershed segmentations. The ensemble of segmentations provides nuanced mean estimates for each pixel in the gas plume, leading to increased machine learning classification confidence. This method shows significant promise for improving machine learning classification applied to real-world HSI collects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recent advances in deep learning, large-scale cloud computing, and open access to decades of Earth observation from public satellite constellations have enabled a breakthrough in automated mapping and monitoring at global scale in near real time. We report on work generating a sequence of annual, global land use and land cover maps at 10 m spatial resolution for years 2017 through 2022, publicly available as an open science product. Each map required processing over 2 million Copernicus Sentinel-2 scenes (approximately 0.6 petabytes). Each map was completed in approximately one week using commercial cloud computing resources. We report our map accuracy and recent work to stabilize the maps across time for monitoring changes across years.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Robustness to image quality degradations is critical for developing Deep Neural Networks (DNNs) for real-world image classification. Prior work explored how various optical aberrations degrade image classification performance [1]. This paper extends this discussion to include optical scatter, which is fundamental to the stray light control of imaging systems and enables further discussion of DNN performance in the context of hardware design. In this paper, multiple state-of-the-art DNN models are evaluated for their image classification performance with imagery that has been degraded by optical scatter.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Convolutional neural networks yield activations from spatial frequency content in an image, allowing them to learn and recognize features of classification targets. This paper explores the spatial frequency response of CNNs in the context of an imaging system's modulation transfer function. Deriving the relationship between CNN design and imaging system design is a fundamental first step in optimizing these systems at the system level.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Images captured in hazy and smoky environments suffer from reduced visibility, posing a challenge when monitoring infrastructures and hindering emergency services during critical situations. The proposed work investigates the use of the deep learning models to enhance the automatic, machine-based readability of gauge in smoky environments, with accurate gauge data interpretation serving as a valuable tool for first responders. The study utilizes two deep learning architectures, FFA-Net and AECR-Net, to improve the visibility of gauge images, corrupted with light up to dense haze and smoke. Since benchmark datasets of analog gauge images are unavailable, a new synthetic dataset, containing over 14,000 images, was generated using the Unreal Engine. The models were trained with an 80% train, 10% validation, and 10% test split for the haze and smoke dataset, respectively. For the synthetic haze dataset, the SSIM and PSNR metrics are about 0.98 and 43 dB, respectively, comparing well to state-of-the art results. Additionally, more robust results are retrieved from the AECR-Net, when compared to the FFA-Net. Although the results from the synthetic smoke dataset are poorer, the trained models achieve interesting results. In general, imaging in the presence of smoke are more difficult to enhance given the inhomogeneity and high density. Secondly, FFA-Net and AECR-Net are implemented to dehaze and not to desmoke images. This work shows that use of deep learning architectures can improve the quality of analog gauge images captured in smoke and haze scenes immensely. Finally, the enhanced output images can be successfully post-processed for automatic autonomous reading of gauges.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a study on the accuracy of three neural network architectures, namely fully-connected neural networks, recurrent neural networks, and attention-based neural networks, in predicting the coupling response of broadband microresonator frequency combs. These frequency combs are crucial for technologies like optical atomic clocks. Optimizing their spectral features, especially the dispersion in coupling to an access waveguide, can be computationally demanding due to the large number of parameters and wide spectral bandwidths involved. To address this challenge, we employ machine learning algorithms to estimate the coupling response at wavelengths not present in the input training data. Our findings demonstrate that when trained with data sets encompassing the upper and lower limits of each design feature, attention mechanisms achieve over 90% accuracy in predicting the coupling rate for spectral ranges six times wider than those used in training. This significantly reduces the computational burden for numerical optimization in ring resonator design, potentially leading to a six-fold reduction in compute time. Moreover, devices with strong correlations between design features and performance metrics may experience even greater acceleration.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traditional image segmentation methods employed with X-ray imaging detectors aboard X-ray space telescopes consist of two stages: first, a low energy threshold is applied; groups of activated pixels are then classified according to their shapes and identified as valid X-ray events or rejected as being possibly induced by cosmic rays. This method is fast and removes up to 98% of the cosmic ray-induced background. However, these traditional methods fail to address two important problems: first, they struggle to recover the true energies of, and sometimes fail to detect entirely, low-energy photons (photon energies less than 0.5keV); second, they consider only the shape of the active pixel regions, ignoring the longer-range context within the image frames. This limits their sensitivity to a specific type of cosmic ray signal: ”islands” created by secondary particles produced by cosmic rays hitting the body of the telescope (the shapes of which are often indistinguishable from X-ray photon signals). Together, these limitations hinder investigations of faint, diffuse targets, such as the outskirts of galaxies and galaxy clusters, and of ”low energy” targets such as individual stars, galaxies and high redshift systems. Both limitations can, however, be addressed with machine learning (ML) models. This work is part of our effort to develop fast and efficient background reduction methods for future astronomical X-ray missions using ML methods. We highlight several significant improvements in the classification and semantic segmentation of our background filtering pipeline. Our more realistic training and test data now incorporate the effects of readout noise and charge diffusion. In the presence of charge diffusion, our model is able to obtain an 80% relative improvement in lost signal recovery compared to the traditional background reduction techniques. We identify several directions for further development of the model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Quasar absorption lines (QALs), created by the light of celestial objects billions of light-years away, can be used to trace gas components from distant galaxies and thus are crucial to the study of galaxy evolution. Ca II QALs, in particular, are important for studying both star formation and recent galaxies because they are one of the dustiest QALs and are located at lower redshifts. However, Ca II QALs are quite difficult to detect, so the number of known Ca II QALs is extremely low, leaving many important models and theories unconfirmed. In this work, we developed an accurate and efficient approach to search for Ca II QALs using deep learning. We created large amount of simulation data for our training set, while we used an existing Ca II QAL catalog for our test set. We also designed a novel preprocessing method aimed at discovering weak Ca II absorption lines. Our solution achieved an accuracy of 96% on the test dataset and runs thousands of times faster than traditional methods. Our trained neural network model was applied to quasar spectra from the Sloan Digital Sky Survey’s Data Releases 7, 12, and 14, and discovered 542 brand-new Ca II QALs and. This is currently the largest catalog of Ca II QALs ever discovered, which will play a significant role in creating new theories and confirming existing theories. Furthermore, our approach can be applied to the search of virtually any other type of QAL, opening up opportunities for ground-breaking research about galaxy evolution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We report a generalizable computational approach to dramatically reduce biomolecular and chemicalsensor response time for applications including medical diagnostics. Comparing the performance of different models, we use experimental data to train ensembles of both traditional recurrent neural networks (RNN) and long short-term memory (LSTM) networks, to accurately predict equilibrium sensor response from data measured over a short time span. This approach is particularly advantageous for sensor platforms with long response times due to poor mass transport, including porous silicon optical biosensors, which we use to validate this methodology through exposure to various concentrations of protein solution and subsequent analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Modern face ID systems are often plagued with loss of privacy. To address this, some face ID systems incorporate image transformations in the detection pipeline. In particular, we consider transforms that convert human face images to non-face images (such as landscape images) to mask sensitive and bias-prone facial features and preserve privacy, while maintaining identifiability.
We propose two metrics that study the effectiveness of face image transformations used in privacy-preserving face ID systems. These metrics measure the invertibility of the transformations to ensure the meta-data of the face (e.g. race, sex, age, etc.) cannot be inferred from the transformed image.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Surround-view fisheye cameras are commonly used for near-field sensing in automated driving scenarios, including urban driving and auto valet parking. Four fisheye cameras, one on each side, are sufficient to cover 360° around the vehicle capturing the entire near-field region. Based on surround view cameras, there has been much research on parking slot detection with main focus on the occupancy status in recent years, but little work on whether the free slot is compatible with the mission of the ego vehicle or not. For instance, some spots are handicap or electric vehicles accessible only. In this paper, we tackle parking spot classification based on the surround view camera system. We adapt the object detection neural network YOLOv4 with a novel polygon bounding box model that is well-suited for various shaped parking spaces, such as slanted parking slots. To the best of our knowledge, we present the first detailed study on parking spot detection and classification on fisheye cameras for auto valet parking scenarios. The results prove that our proposed classification approach is effective to distinguish between regular, electric vehicle, and handicap parking spots.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For nearly twenty years, a multitude of Compressive Imaging (CI) techniques have been under development. Modern approaches to CI leverage the capabilities of Deep Learning (DL) tools in order to enhance both the sensing model and the reconstruction algorithm. Unfortunately, most of these DL-based CI methods have been developed by simulating the sensing process while overlooking limitations associated with the optical realization of the optimized sensing model. This article presents an outline of the foremost DL-based CI methods from a practitioner's standpoint. We conduct a comparative analysis of their performances, with a particular emphasis on practical considerations like the feasibility of the sensing matrices and resistance to noise in measurements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we create mix-and-matched generative networks to address privacy and bias
concerns in face recognition systems. There has been a rise in bias based on religion, gender, and race. To preserve the robustness of face ID systems while masking these bias-inducing facial features, we map the faces to neutral natural landscape images. This still leaves the possibility of estimating facial features from the landscape images. We address this issue through decorrelation shuffling functions between the latent spaces of the encoder and the generator networks, as a way of decorrelating facial and landscape features and preventing hacking.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Retinopathy is a common complication of diabetes that can cause severe vision loss if not detected and managed promptly. In this study, we propose a comprehensive approach that leverages image processing techniques to analyze fundus images of patients with diabetic retinopathy. Our primary focus is on vein extraction and hemorrhage detection, with exudate detection being performed only on specific images to showcase advancements in the current prototype algorithm. The dataset used in this project consists of images obtained from Mexican ophthalmology institutes, ensuring its relevance and applicability to the local population. By extracting veins and hemorrhages, we aim to capture crucial features indicative of the severity of retinopathy. These generated images, along with the original dataset, are utilized to train convolutional neural network (CNN) models, enabling accurate classification of the disease's degree into three categories. The significance of this project lies in its potential to serve as an auxiliary tool in diagnosing diabetic retinopathy. By automating the analysis of fundus images and providing objective classification results, our algorithm aims to assist healthcare professionals in making informed decisions regarding treatment and management options. The proposed method can potentially enhance the efficiency and precision of diabetic retinopathy (DR) diagnosis, improving Mexican health outcomes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Sparse coding has long been thought of as a model of the biological visual system, yet previous approaches have not employed it as a method to model the activity of individual neurons in response to arbitrary images. Here, we present a novel model of primary cortical neurons based on a biologically-plausible sparse coding model termed the locally-competitive algorithm (LCA). Our hybrid LCA-CNN model, or LCANet, is trained on a self-supervised objective using a standard image dataset and regression models are trained to predict neural activity based on a modern neurophysiological dataset containing the responses of hundreds of neurons to natural image stimuli. Our novel sparse coding model better represents the computations performed by biological neurons and is significantly more interpretable than previous models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In Bioinformatics, batch effect detection is a challenging task where the clustering approaches have been explored most of the time. In this study, we proposed a novel approach to identify batch effects and visualization with unsupervised analysis methods. We used the most significant gene sets 500,1500, and 2500 genes out of 35238 genes for the human-liver RNA seq dataset by applying standard deviation (SD). The skmeans and kmeans methods were explored on the selected gene subsets. Then, principal component analysis (PCA) was used for embedding to the 10-dimensional subspace. Finally, the Uniform Manifold Approximation and Project (UMAP) was applied to cluster and visualize the outputs. The experimental results demonstrate the robust representation and achieve the best clustering and visualization for features extracted from 1500 genes. These findings are not only useful for batch effect detection and removal tasks but also can be used to label new samples to train the supervised machine learning methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Alexithymia describes a psychological state where individuals struggle with feeling and expressing their emotions. Individuals with alexithymia may also have a more difficult time understanding the emotions of others and may express atypical attention to the eyes when recognizing emotions. This is known to affect individuals with Autism Spectrum Disorder (ASD) differently than neurotypical (NT) individuals. Using a public data set of eye-tracking data from seventy individuals with and without autism who have been assessed for alexithymia, we train multiple traditional machine learning models for alexithymia classification including support vector machines, logistic regression, decision trees, random forest, and multilayer perceptron. To correct for class imbalance, we evaluate four different oversampling strategies: no oversampling, random oversampling, SMOTE, and ADASYN. We consider three different groups of data: ASD, NT, and combined ASD+NT. We use a nested leave-one-out cross validation strategy to perform hyperparameter selection and evaluate model performance. We achieve F1 scores of 90.00% and 51.85% using decision trees for ASD and NT groups, respectively, and 72.41% using SVM for the combined ASD+NT group. Splitting the data into ASD and NT groups improves recall for both groups compared to the combined model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Artificial intelligence (AI) based analysis accelerates clinical diagnosis from pathological images efficiently and accurately. Due to the high dimensionality of pathological images, extracting meaningful feature representations of the pixels from high-dimensional images is essential. This can be used for further analysis to obtain better insights. This study used Deep Convolutional Neural Networks (DCNN) and end-to-end Deep Convolutional auto-encoder outcomes (DCAE) models for feature extraction. K-means and K-Nearest Neighbors (KNN) methods were then used for clustering and classification and achieved 95% testing accuracy with these unsupervised classification methods. In addition, t-distributed Stochastic Non-linear Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) were applied for clustering and visualization of different tissue types and demonstrated promising representation for histopathological image clustering of 20 different tissue types. Within-Cluster and Square (WSS) errors were used to determine the optimal number of classes in cluster space with t-SNE and UMAP methods. Most importantly, the proposed system relies on class probability and the visual interpretation that provides the relationship among the 20 different pathological tissue types. The proposed pipeline is potentially applicable for understanding pathological image classification and clustering tasks to obtain better insight into digital pathology applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Unmanned aerial vehicles (UAVs) have enjoyed a meteoric rise in both capability and accessibility—a trend that shows no signs of slowing. This has led to a growing need for detect-and-avoid technologies. These increasingly commonplace events have resulted in the development of a number of UAV detection methods, most of which are based on either radar, acoustics, visual, passive radio-frequency, or lidar detection technology. With regards to software, many of these UAV detection systems have begun to implement machine learning (ML) as a means to improve detection and classification capabilities. In this work, we detail a new lidar and ML-based propeller rotation analysis and classification method using a wingbeat-modulation lidar system. This system has the potential to sense characteristics, such as propeller speed and pitch, that other systems struggle to detect. This paper is an exploration into the preliminary development of our method, and into its potential capabilities and limitations. Using this method, propeller speed could be detected with a worst-case percent error of approximately 3.7% and an average percent error of approximately 2% when the beam was positioned on the propeller. Furthermore, Wide Neural Networks were able to accurately detect and characterize propeller signals when trained to determine either beam position or propeller orientation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Model cards organize and portray information about a model’s training data, hyper parameters, and behavior, such as predictive accuracy and bias for machine learning (ML) algorithms or models. Consumers use these cards to determine whether a model is well suited for a particular use case. However, the current design of model cards does not include resource utilization metrics, which are important when models are being optimized for specific hardware. The main objective of this study was to determine which set of hardware-centric performance metrics provided the most value to users when comparing among different algorithms for Internet of Things (IoT) edge devices. The Department of Defense (DoD) can utilize these metrics to select the best model for key mission tasks within the Command and Control (C2) missions that require AI/ML deployment on edge devices. Our study focused on finding correlations between resource availability and machine learning models’ computational footprint using three key metrics: energy consumption, execution time, and memory utilization. The data was gathered from simulated environments hosted by Docker containers (i.e., lightweight, stand-alone, executable package of software that includes everything needed to run an application) with various memory sizes. Two different series of experiments were performed. The first series of experiments simulated environments with 256 MB to 8GB (steps of the powers of 2) of available memory. For each memory size, we deployed 50 different containers, calculated the metrics from each container, and then recorded the average result of each metric. Two ML models from Microsoft’s EdgeML library that are designed to run more efficiently on edge devices were used: Bonsai, a decision tree based model, and ProtoNN, a KNN based model. Using Spearman’s correlation, we found that none of the metrics showed promising correlations for the Bonsai model, while the ProtoNN model demonstrated a correlation of 0.5 for its computational time. In order to investigate the lack of adequate results, we ran a second series of experiments in a similar fashion but with smaller memory sizes from 250MB to 950MB (in steps of 50MB). Using Spearman’s correlation for the Bonsai model with smaller memory sizes, we found that RAM energy consumption showed the most promising correlation value of 0.75. While for ProtoNN model with smaller memory sizes, both the RAM energy consumption and total energy consumption showed promising correlation values of 0.63 and 0.71 respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
providing insight into how the model makes predictions, as opposed to a black-box approach. The predicted values were very correlated with measurements taken from a Scintec BLS 900.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Achieving carbon neutrality has become the United Nation’s most urgent mission, but the lack of data, evaluation criteria and associated techniques presents a challenge. Moreover, the energy crisis in 2022 has unexpectedly complicated carbon dioxide (CO2) data, and existing research focuses primarily on CO2 absolute emissions. Policymakers have established milestones on carbon reduction roadmap but have failed to meet them. Therefore, we adopt the new CO2 emission and sink data released in November 2022. Our approach leverages Time Varying Parameter Vector Auto Regression (TVP-VAR) model and Monte-Carlo simulation to monitor the dynamics of net-zero emission roadmap. This approach provides insights into the global pathway towards The United Nations Framework Convention on Climate Change (UNFCCC).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Convolutional Neural Networks (CNNs) are frequently used in a wide range of applications, including speech, image recognition and natural language processing. However, due to the computational complexity of CNNs, deploying these networks on resource-limited edge devices has become a significant challenge. Sparse CNNs use the sparsity in the weight matrices of the networks to minimize computations while maintaining accuracy. By storing only the nonzero values, the Compressed Sparse Row (CSR) format compresses the sparse matrix, lowering the memory requirement and computational complexity of the network. This work presents a novel approach for accelerating Sparse CNNs on Field-Programmable Gate Arrays (FPGAs) using the CSR format and systolic arrays. The proposed method takes advantage of systolic arrays' parallel processing capabilities to perform CSR-based sparse convolutions. Furthermore, an algorithm has been presented that optimizes the data layout to maximize data reuse and minimize data movement between different processing elements of the systolic array and external memory. The architecture is evaluated and compared to a state-of-the-art GPU implementation on several benchmark datasets. The proposed architecture outperformed the GPU-based implementation in terms of throughput and power efficiency by 1.42x and 22.4x, respectively. The presented approach provides a promising solution for accelerating Sparse CNNs on resource-constrained devices and enabling the deployment of these networks in a variety of applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Model-based machine learning methods incorporate domain knowledge from the physical forward model of an inverse problem to reduce the need for training data. In this research, we show how this can be used to address challenging limitations such as occlusion. We combine a convolutional neural network with a novel computational reconstruction method that combines source and attenuation distributions in order to model occlusion. We demonstrate the ability to quickly learn to address reconstruction artifacts and opacity, forming a significantly improved final image of the scene based on as little as a single training image. The algorithm can be implemented efficiently and scaled to large problem sizes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Capturing data on animal movements, especially for nocturnal animals, is a time-consuming and arduous task for biologists. They typically rely on trail cameras, and have to manually search through low-quality video feeds to locate the desired footage. Leveraging advancements in artificial intelligence and hardware devices, such as microcontrollers with cameras, can help address this issue. Everyday consumers can also use this technology in their home surveillance cameras. In this study, the automation of animal identification in real-time using a night vision camera with a simple microcontroller such as ESP32 is explored. A motion detection algorithm on the device creates a trigger on the camera, when motion is detected. The captured image is sent to an image classification model to identify the presence of an animal and its type. The image classification model is deployed in the cloud as a REST API service. Finally, the predictions are displayed on the LCD screen on the device. We constructed two deep learning models, MobileNetV2 and ResNet50 for image classification and evaluated their performance. To test their accuracy, we utilized a validation set of images for three distinct species and a smaller test set of images for each of these species. We experimented with various hyper-parameters, such as epochs and learning rate to determine the model with best performance. ResNet50 model with 50 epochs and a learning rate of .0001 produced the most satisfactory results for our objectives. We then deployed the model as a REST API service for animal detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We use unsupervised machine learning approaches in a completely data-driven binarization routine for vibration sensors with minimal lag. Gyroscopic vibration sensors are inherently noisy as they report analog signals that must be translated into digital values, in our case whether a pump is running (“on”) or not (“off”). When analyzing data from different pumps, each of which has their own baseline vibration values and magnitude of vibration, manual annotation is not feasible. We have tested multiple unsupervised methods including k-means clustering, Gaussian naïve Bayes, and ensemble learning to correctly binarize the analog signals. Comparisons are made on the basis of “blips” or times where the algorithm predicted an incorrect state for a short period of time before returning to the current state. This provides an objective metric with which to evaluate an algorithm’s success at binarizing the signal. We present results from an experimental design to probe the efficacy of different learning methods across data collected from ten vibration sensors deployed on pumps in a water treatment plant. We use initial k-means clustering for many algorithms to get an initial guess of the on or off state of the pump. From there we use a variety of smoothing, Gaussian naïve Bayes classifiers, and ensemble learning to get a final classification of pump activity. We apply these methods to data collected from ten sensors deployed on distinct pumps.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Low-light image enhancement has posed a significant challenge in recent years due to non-uniform luminance in real-world images. Color restoration, luminance mapping, and estimation of the curve and levels are some techniques used by the algorithms to enhance the image. However, real-world images often have non-uniform luminance, requiring local enhancement in certain areas rather than global enhancement. In order to tackle this issue, this paper introduces a novel methodology based on deep learning, employing two convolutional network architectures. The first one classifies the brightness level of the input image, while the second one enhances the brightness level based on information obtained through the first architecture. To train this model, two commonly used datasets in state-of-the-art research are used: the LOL (Low-Light) and Synthetic Low-light. Both datasets contain low-light and ground-truth image pairs, which makes it possible to make a proper estimate between non-uniform and uniform luminosity. The proposed algorithm is applied over resized images from the UHD-LOL4k dataset, with a performance evaluation through the metrics: Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Image Quality Evaluator (NIQE), and Blind /Unreferenced Image Spatial Quality Evaluator (BRISQUE). According to the results, the proposed method outperforms algorithms with more complex architectures in the literature. The double convolutional architecture emphasizes local enhancement in real-world scenes and global enhancement in images with very low luminosity. Overall, this paper presents a significant contribution to low-light image enhancement offering an effective solution to the challenges posed by non-uniform luminance in real-world images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Keratoconus is a chronic-degenerative disease which results in progressive corneal thinning and steepening leading to irregular astigmatism and decreased visual acuity that in severe cases may cause debilitating visual impairment. In recent years, different Machine Learning methods have been applied to distinguish either normal and keratoconic eyes. These methods utilize both corneal curvature maps and their corresponding numeric indices to perform the classification. The main objective of this study is to evaluate the performance of features extracted with Histograms of Oriented Gradients (HOG) and with Convolutional Neural Networks (CNN) in the classification of normal and keratoconic eyes, using axial map of the anterior corneal surface. Two distinct models were trained using the same Multilayer Perceptron (MLP) architecture: one of them using the HOG features as input, and the other with the CNN features. The Topographic Keratoconus Classification index (TKC) provided by Pentacam™ was used as a label and the KC2-labeled maps were defined as keratoconus. Each model was trained using 3,000 images of normal and 3,000 keratoconic eyes, and then validated and tested on 1,000 images of each label. The model trained with HOG features exhibited a sensitivity of 99.1% and specificity of 98.7%, with an Area Under the Curve (AUC) of 0.999143. The model trained with CNN features showed both sensitivity and specificity of 99.5%, and AUC = 0.999778. The results suggest that the performance of the classifier is similar for both types of features.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automated navigation of Unmanned Aircraft Systems (UAS) in a broad range of illumination scenarios implies improved and real-time depth estimation and long-distance obstacle detection. We present our lightweight ultra wide-angle camera optimized for low-light illumination (down to < 1 lux) mounted on a drone and compare its optical performance with other module found in the market. We also capture images from the drone in flight and test them on monocular depth estimation neural networks and show that our camera module is suitable for low-light navigation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Prescribed fires are an important part of forest stewardship in Western North America, understanding prescribed burn behavior is important because if done incorrectly can result in unintended burned land as well as harm to humans and the environment. We looked at ensemble datasets from QUIC-Fire, a fire-atmospheric modeling tool, and compared various machine learning models effectiveness at predicting outcome variables, such as area burned inside and outside the control boundary, and if the fire behavior was safe or unsafe. It was found that out of the tested machine learning models random forest performed best at predicting all three predictor variables of interest.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, we utilized deep learning models for depth image regression to predict chicken weights. The dataset consists of annotated 99,427 depth images obtained from 18,706 chickens standing on the weighing scale during the rearing days 21-84. Pretrained models performed regression on the depth image data, including Mobilenet V2, ResNet50 V2, ResNet101 V2, ResNet152 V2, InceptionV3, and Xception. All models performed comparable results regarding mean absolute error (MAE) and mean relative error (MRE); however, Xception performed best with an MAE of 17.2 g and an MRE of 2.52% on the test dataset compared to the reference weight. Based on these results, chicken weight estimation using depth images and deep learning is a promising technique for daily growth rate monitoring for the poultry industry.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The convolutional neuronal network (CNN) performs spatial learning on a two-dimensional data (e.g., images) using filters to learn features from the images. Hence it requires many images that have high discriminant spatial and longitudinal features, within and between classes for comprehensive learning. When this requirement is not met the CNN models suffer from the data paucity problem that leads to limited learning and poor classification performance. The segmentation and detection of birds from RGB videos to study the behavior of backyard birds is one of the applications that suffer from this data paucity problem. This paper first presents a new backyard birds’ dataset that is extracted from RGB videos and consisted of the images of a cardinal and a sparrow to use it for developing an artificial neural network (ANN) model with a frequency-driven feature learning approach. It was observed that the images of these birds and their discriminant textures are geometrically distorted due to rapid movements and postures of these birds. These geometrical distortions bury the true representations of the main and the side lobs of the frequency spectrum of the images of the birds. To extract these latent features at different frequency bands and construct feature vectors for training an ANN model, Kaiser–Bessel window is used in the frequency domain along with the fast Fourier transform. Simulations show that by carefully selecting the model’s parameters of the ANN model and the simulation parameters, we can achieve segmentation and detection of the cardinal and sparrow images with about 98% and 96% training and testing accuracy, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.