Machine learning algorithms require datasets that are both massive and varied to train and generalize effectively. However, preparing real-world semantically labeled datasets is a very time-consuming and cumbersome task. Also, training with low volume datasets can lead to compromised performance and poor generalization of such algorithms. This algorithm performance and generalization gap due to limited quantities of real-world data could be decreased with the help of synthetic datasets that are generated with the consideration of real-world features. In this work, a combination of synthetic and real-world datasets is used to demonstrate and assess the performance of simulated-to-real-world transfer learning algorithms where the training is done in synthetic and testing in real-world datasets. The performance is further evaluated with a mixture of real and synthetic datasets. Two simulators are used in this work to generate synthetic images. The first was the Mississippi State University Autonomous Vehicle Simulator (MAVS), a high-fidelity physics-based simulator for Autonomous Ground Vehicle (AGV) in off-road terrain. The MAVS has been used to study machine learning in a variety of applications using both camera and lidar data. In addition to MAVS, the Unreal Engine version 4 (UE4) was used to generate images. Finally, images with a variety of synthetic scene fidelities and real-world images were considered for training the neural network to evaluate the effectiveness of low-fidelity synthetic data and the network performed very well with excellent confidence scores for object detection.
The accumulation of falling snow is a complex physical process that involves a variety of environmental factors. While much past work has been done on the rendering of accumulated snow for gaming applications, scientific simulation of snow accumulation has been limited to large-scale mountain ranges and watersheds. These largescale simulations are not relevant for simulations of autonomous ground vehicle (AGV) performance, for which the relevant length scales are a few meters to a few hundred meters. In this work, we present a physics-based simulation of the accumulation of falling snow that is implemented using smoothed-particle hydrodynamics (SPH) to represent snow mass elements. SPH has been used in past work to simulate not only fluids but also deformable and continuous media ranging from concrete to fabric to soil. In this work we show that SPH can be parametrized to have material properties that reasonably approximate the bulk properties of accumulated snow. We present several example simulations in which SPH has been used to calculate the accumulation of fallen snow in an off-road scene. Finally, we show how the SPH simulation output can be combined with a rendering simulation to create realistic synthetic images.
Detecting and localizing large obstacles, particularly trees in unpaved regions, holds significant importance in autonomy, navigation, and related fields. Equally crucial is the extraction of detailed physical information from sensor captures. Accurately estimating the physical parameters of trees, such as the tree diameter at breast height (DBH), is particularly valuable for commercial and research purposes, especially in forestry and ecological studies. This estimation also plays a pivotal role in navigational tasks within densely vegetated regions, where overcoming obstacles becomes essential to achieving objectives. Achieving the required accuracy often entails labor-intensive processes such as manual collection aiming, data segmentation, or shape-building through mapping. In this context, we propose an algorithm based on particle swarm optimization (PSO) assisted Hough Transformation (HT) for tree DBH estimation, utilizing solely the physical spatial information from an actively available LiDAR point cloud. As a point of comparison, a straightforward circular HT-based method is also implemented. Our proposed approach surpasses the base HT method, demonstrating superior performance with an average error of 5.60 cm and RMSE of 6.57 cm, all while maintaining low time costs. These results reveal promising implications for this research direction in real-world applications, particularly in push-through navigation scenarios.
Simulation has become an important enabler in the development and testing of autonomous ground vehicles (AGV), with simulation being used both to generate training data for AI/ML-based segmentation and classification algorithms and to enable in-the-loop testing of the AGV systems that use those algorithms. Furthermore, digital twins of physical test areas provide a safe, repeatable way to conduct critical safety and performance testing of these AI/ML algorithms and their performance on AGV systems. For both these digital twins and the sensor models that use them to generate synthetic data, it is important to understand the relationship between the fidelity of the scene/model and the accuracy of the resulting synthetic sensor data. This work presents a quantitative evaluation of the relationship between digital scene fidelity, sensor model fidelity, and the quality of the resulting synthetic sensor data, with a focus on camera data typically used on AGV to enable autonomous navigation.
Object detection in aerial images is a challenging task as some objects are only a few pixels wide, some objects are occluded, and some are in shade. With the cost of drones decreasing, there is a surge in the amount of aerial data, so it will be useful if models can extract valuable features from the aerial data. Convolutional neural networks (CNN) are a useful tool for object detection and machine learning applications. However, machine learning requires labeled data to train and test the CNN models. In this work, we used a simulator to automatically generate labeled synthetic aerial imagery to use in the training and testing of machine learning algorithms. The synthetic aerial data used in this work was developed using a physics-based software tool called Mississippi State University Autonomous Vehicle Simulator (MAVS). We generated a dataset of 871 aerial images of 640x480 resolution and implemented Keras-RetinaNet framework with ResNet 50 as backbone for object detection. Keras-RetinaNet is one of the popular object detection models to be used with aerial imagery. As a preliminary task, we detected buildings in the synthetic aerial imagery and our results show a high mAP (mean Average Precision) accuracy of 77.99% using the state-of-the-art RetinaNet model.
LiDAR-based 3D semantic segmentation is one of the most widely used perception methods to support scene understanding of self-driving vehicles. Most publicly available LiDAR datasets for driving scene segmentation, such as SemanticKITTI, nuScenes, and SemanticPOSS, provide only a single type of LiDAR configuration. Therefore, testing a trained model with a different channel configuration than the training dataset is sometimes inevitable in real-world applications. Despite the significance of this LiDAR channel mismatch problem in the machine learning pipeline, little research has focused on investigating the impact of the LiDAR configuration shift on a model’s test performance. This paper aims to provide referenceable baseline experiments for the LiDAR configuration shifts. We explore the effect of using different LiDAR channels when training and testing a 3D LiDAR point cloud semantic segmentation model, utilizing Cylinder3D for the experiments. A Cylinder3D model is trained and tested on simulated 3D LiDAR point cloud datasets created using the Mississippi State University Autonomous Vehicle Simulator (MAVS) and 32, 64 channel 3D LiDAR point clouds of the RELLIS-3D dataset collected in a real-world off-road environment. Our experimental results demonstrate that sensor and spatial domain shifts significantly impact the performance of LiDAR-based semantic segmentation models. In the absence of spatial domain changes between training and testing, models trained and tested on the same sensor type generally exhibited better performance. Moreover, higher-resolution sensors showed improved performance compared to those with lower-resolution ones. However, results varied when spatial domain changes were present. In some cases, the advantage of a sensor’s higher resolution led to better performance both with and without sensor domain shifts. In other instances, the higher resolution resulted in overfitting within a specific domain, causing a lack of generalization capability and decreased performance when tested on data with different sensor configurations.
Failures by autonomous ground vehicles (AGV) may be caused by many different factors in hardware, software, or integration. Effective safety and reliability testing for AGV is complicated by the fact that failures are not only infrequent but also difficult to diagnose. In this work, we will discuss the results of a three-phase project to develop a simulation-based approach to AGV architecture design, test implementation, and simulation integration. This approach features a modular AGV architecture, reliability testing with a physics-based simulator (the MSU Autonomous Vehicle Simulator, or MAVS), and validation with a limited number of field trials.
Autonomous driving in off-road environments is challenging as it does not have a definite terrain structure. Assessment of terrain traversability is the main factor in deciding the autonomous driving capability of the ground vehicle. Traversability in off-road environments is defined as the drivable track on the trails by different vehicles used in autonomous driving. It is very crucial for the autonomous ground vehicle (AGV) to avoid obstacles such as trees, boulders etc. while traversing through the trails. The goal of this research has three main objectives: a) collection of 2D camera data in the off-road / unstructured environment, b) annotation of 2D camera data depending on the vehicles’ ability to drive through the trails , and c) application of semantic segmentation algorithm on the labeled dataset to predict the trajectory based on the type of ground vehicle. Our models and labeled datasets will be publicly available.
Autonomous navigation (also known as self-driving) has rapidly advanced in the last decade for on-road vehicles. In contrast, off-road vehicles still lag in autonomous navigation capability. Sensing and perception strategies used successfully in on-road driving fail in the off-road environment. This is because on-road environments can often be neatly categorized both semantically and geometrically into regions like driving lane, road shoulder, and passing lane and into objects like stop sign or vehicle. The off-road environment is neither semantically nor geometrically tidy, leading to not only difficulty in developing perception algorithms that can distinguish between drivable and non-drivable regions, but also difficulty in the determination of what constitutes "drivable" for a given vehicle. In this work, the factors affecting traversability are discussed, and an algorithm for assessing the traversability of off-road terrain in real time is developed and presented. The predicted traversability is compared to ground-truth traversability metrics in simulation. Finally, we show how this traversability metric can be automatically calculated by using physics-based simulation with the MSU Autonomous Vehicle Simulator (MAVS). A simulated off-road autonomous navigation task using a real-time implementation of the traversability metric is presented, highlighting the utility of this approach.
Semantic Segmentation using convolutional neural networks is a trending technique in scene understanding. As these techniques are data-intensive, several devices struggle to store and process even a small batch of images at a time. Also, as the volume of training datasets required by the training algorithms is very high, it might be wise to store these datasets in their compressed form. Not only this, in order to correspond the limited bandwidth of the transmission network the images could be compressed before sending to the destination. Joint Photography Expert Group (JPEG) is a famous technique for image compression. However, JPEG introduces several unwanted artifacts in the images after compression. In this paper, we explore the effect of JPEG compression on the performance of several deep-learning-based semantic segmentation techniques for both the synthetic and real-world dataset at various compression levels. For some established architectures trained with compressed synthetic and real-world dataset, we noticed the equivalent (and sometimes better) performances compared to uncompressed dataset with substantial amount of storage space reduced. We also analyze the effect of combining original dataset with the compressed dataset with different JPEG quality levels and witnessed a performance improvement over the baseline. Our evaluation and analysis indicates that the segmentation network trained on compressed dataset could be a better option in terms of performance. We also illustrate that the JPEG compression acts as a data augmentation technique improving the performance of semantic segmentation algorithms.
For autonomous vehicles 3D, rotating LiDAR sensors are often critically important towards the vehicle’s ability to sense its environment. Generally, these sensors scan their environment, using multiple laser beams to gather information about the range and the intensity of the reflection from an object. LiDAR capabilities have evolved such that some autonomous systems employ multiple rotating LiDARs to gather greater amounts of data regarding the vehicle’s surroundings. For these multi–LiDAR systems, the placement of the sensors determine the density of the combined point cloud. We perform preliminary research regarding the optimal LiDAR placement strategy on an off–road, autonomous vehicle known as the Halo project. We use the Mississippi State University Autonomous Vehicle Simulator (MAVS) to generate large amounts of labeled LiDAR data that can be used to train and evaluate a neural network used to process LiDAR data in the vehicle. The trained networks are evaluated and their performance metrics are then used to generalize the performance of the sensor pose. Data generation, training, and evaluation, was performed iteratively to perform a parametric analysis of the effectiveness of various LiDAR poses in the Multi–LiDAR system. We also, describe and evaluate intrinsic and extrinsic calibration methods that are applied in the multi–LiDAR system. In conclusion we found that our simulations are an effective way to evaluate the efficacy of various LiDAR placements based on the performance of the neural network used to process that data and the density of the point cloud in areas of interest.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.