KEYWORDS: 3D modeling, Image segmentation, 3D image processing, 3D image reconstruction, Visual process modeling, Systems modeling, 3D metrology, Performance modeling, Clouds, Neural networks
imultaneous 3D scene reconstruction and semantic segmentation are required in many applications such as autonomous driving, robotics, and optical metrology. Classic 3D reconstruction methods usually perform such operations twofold. Firstly, a 3D scanner or laser scanner acquires a point cloud. Secondly, semantic segmentation of the point cloud is performed. Recently a new kind of 3D model representation was proposed that utilizes the trapezium-shaped voxels that are aligned with the camera’s frustum and pixels [1]. Frustum voxel models proved to be effective for monocular 3D scene reconstruction and segmentation from monocular images [2]. Still, many existing 3D scanning systems readily provide stereo cameras. The performance of frustum voxel model-based methods for stereo input remains an open question. This paper is focused on the evaluation of the 3D reconstruction quality of a volumetric neural network with a monocular and stereo input. We leverage an SSZ [2] volumetric neural network as a starting point for our research. We develop its modified version that we term Stereo-SSZ that receives a stereo pair as an input. We compare the performance of the original SSZ model and our Stereo-SSZ model on different real and synthetic 3D shape datasets. Specifically, we generate a stereo version of the SemanticVoxels [2] dataset and capture stereo pairs of multiple real objects using a structured light scanner. The results of our experiments are encouraging and demonstrate that the model with a stereo input outperforms the original monocular SSZ network. Specifically, the frustum voxel models generated by our Stereo-SSZ model have lower surface distance errors and demonstrate fine details in the reconstructed 3D models.
KEYWORDS: 3D modeling, Unmanned aerial vehicles, Image segmentation, Object recognition, 3D image processing, Neural networks, Motion models, Data modeling, Data processing, Computing systems
Impressive progress in technical characteristics of modern unmanned aerial vehicles (UAV) provides new opportunities for their exploiting in different applications and missions which were impossible earlier. The growing applicability of UAVs is based on high performance of modern computers and latest advances in sensor data processing techniques.
Recent decades modern convolutional neural network (CNN) models have demonstrated the state of the art performance in many computer vision problems seemed to be solved properly only by a human. The study is aimed at developing a deep learning techniques for UAV autonomous navigation in complex environment in obstacle avoidance mode. Such kind of navigation is required for cargo delivery or rescue mission in urban, industrial or forestry environment when global geo-positioning system can be unavailable.
For navigating in complex environment UAV have to recognize objects of observed scene and to estimate distance for possible obstacle. The proposed technique to solve these tasks exploits deep learning approach for image segmentation and depth map estimation using an image of the observed scene.
The convolutional neural network model is developed capable to predict depth map of the observed scene along with scene segmentation according the predefined object classes. The proposed neural network architecture is based on generative adversarial model with generative part translating an input color image into an output voxel model. The aim of the discriminative part is to estimate how close the output to real data and to penalize false output. Both generative and discriminative parts are trained simultaneously on the specially prepared dataset.
Evaluation on the testing part of the prepared dataset has demonstrated the ability of the developed neural network model to perform segmentation of unobserved complex scenes containing several objects and estimating depth map for this scene. The proposed neural network architecture provides high generalization ability for new scenes.
KEYWORDS: 3D modeling, 3D image processing, Sensors, Data modeling, 3D image reconstruction, Reconstruction algorithms, Convolutional neural networks, Detection and tracking algorithms
Objects without prominent textures pose challenges for an automatic 3D model reconstruction and feature point matching. Such objects are common in many industrial applications such as metal defect detection, archeological applications of photogrammetry, and 3D object reconstruction from infrared imagery. Most of the common feature point descriptors fail to match local patches in featureless regions of an object. Various kind of textures requires different feature descriptors for high-quality image matching. Hence, automatic low-textured 3D object reconstruction using Structure from Motion (SfM) methods is challenging. Nevertheless, such reconstruction is possible with the aid of a human operator. Deep learning-based descriptors have outperformed most of common feature point descriptors recently. This paper is focused on the development of a new conditional generative adversarial auto-encoder (GANcoder) based on the deep learning. We use a coder-decoder architecture with four convolutional and four deconvolutional layers as a staring point for our research. Our main contribution is a generative adversarial framework GANcoder for training the auto-encoder on the textureless data. Traditional training approaches using an L1 norm tend to converge to the mean image on the low-textured images. In contrast, we use an adversarial discriminator to provided an additional loss function that is focused on distinguishing real images from the training dataset from the auto-encoder reconstruction. We collected a large GANPatches dataset of feature points from nearly textureless objects to train and evaluate our model and baselines. The dataset includes 16k pairs of image patches. We performed qualitative evaluation of our GANcoder and baselines for two tasks. Firstly, we compare the matching score of the our GANcoder and baselines. Secondly, we evaluate the accuracy of 3D reconstruction of low-textured objects using an SfM pipeline with stereo-matching provided by our GANcoder. The results of the evaluation are encouraging and demonstrate that our model achieves and surpasses the state of the art in the feature matching on low-textured objects.
Automatic dense labeling of multispectral satellite images facilitates faster map update process. Water objects are essential elements of a geographic map. While modern dense labeling methods perform robust segmentation of such objects like roads, buildings, and vegetation, dense labeling of hydrographic regions remains a challenging problem. Water objects change their surface albedo, color, and reflection in different weather and different seasons. Moreover, rivers and lakes can change their boundaries after floods or droughts. Robust documentation of such seasonal changes is an essential task in the field of analysis of satellite imagery. Due to the high variance in water object appearance, their segmentation is usually performed manually by a human operator. Recent advances in machine learning have made possible robust segmentation of static objects such as buildings and roads. To the best of our knowledge, there is little research in the modern literature regarding dense labeling of water regions. This paper is focused on the development of a deep-learning-based method for dense labeling of hydrographic in aerial and satellite imagery. We use the GeoGAN framework and MobileNetV2 as the starting point for our research. The GeoGAN framework uses an aerial image as an input to generate pixel-level annotations of five object classes: building, low vegetation, high vegetation, road, and car. The GeoGAN framework leverages two deep learning approaches to ensure robust labeling: a generator with skip connections and Generative Adversarial Networks. A generator with skip connections performs image→label translation using feed-forward connections between convolutional and deconvolutional layers of the same depth. A GAN framework consists of two competing networks: a generator and a discriminator. The adversarial loss improves the quality of the resulting dense labeling. We made the following contributions to the GeoGAN framework: (1) new MobileNetV2-based generator, (2) adversarial loss function. We term the resulting framework as HydroGAN. We evaluate our HydroGAN model using a new HydroViews dataset focused on dense labeling of areas that are subject to severe flooding during the spring season. The evaluation results are encouraging and demonstrate that our HydroGAN model competes with the state-of-the-art models for dense labeling of aerial and satellite imagery. The evaluation demonstrates that our model can generalize from the training data to previously unseen samples. The developed HydroGAN model is capable of performing dense labeling of water objects in different seasons. We made our model publicly available.
KEYWORDS: 3D modeling, Structured light, 3D scanning, Cameras, Laser scanners, Scanners, Projection systems, 3D acquisition, 3D image processing, Data modeling, Convolutional neural networks
Precise and robust 3D model reconstruction is required in various outdoor scenarios such as aircraft inspection, rapid prototyping, and documentation of an archaeological site. This paper is focused on the development of a mobile structured light 3D scanner for online reconstruction. We use a structured light projection for fast data acquisition. We use deep learning based pattern matching to improve accuracy up to 0.1 pixels. Our FringeMatchNet is based on the U-Net architecture. The network produces an estimated shift heatmap with subpixel accuracy. Each pixel (x, y) of the heatmap represents the probability that the shift between the two image patches is equal to (x, y). We generated a large dataset that combines synthetic and real image patches to train our FringeMatchNet. We compare the accuracy of our FringeMatchNet with other stereo matching algorithms both hand-crafted (SGM, LSM) and modern deep learning-based. The evaluation proves that our network outperforms hand-crafted methods and competes with modern state-of-the-art deep learning based algorithms. Our scanner provides the measuring volume of 400 × 300 × 200 mm for an object distance of 600–700 mm. It combines portability with an object point resolution between 0.2 and 0.1 mm. We made our FringeMatchNet and the training dataset publicly available. Deep learning-based stereo matching using our FringeMatchNet facilitates subpixel registration and allows our scanner to achieve sub-millimeter accuracy in the object space.
Algorithms for automatic semantic segmentation of the satellite images provide an effective approach for the generation of vector maps. Convolutional neural networks (CNN) have achieved the state-of-the-art quality of the output segmentation on the satellite images-to-semantic labels task. However, the generalization ability of such methods is not sufficient to process the satellite images that were captured in the different area or during the different season. Recently, the Generative Adversarial Networks (GAN) were introduced that can overcome the overfitting using the adversarial loss. This paper is focused on the development of the new GAN model for effective semantic segmentation of multispectral satellite images. The pix2pix1 model is used as the starting point of the research. It is trained in the semi-supervised setting on the aligned pairs of images. The perceptual validation has demonstrated the high quality of the output labels. The evaluation on the independent test dataset has proved the robustness of GANs on the task of semantic segmentation of multispectral satellite images.
Thermal imaging cameras improve the situational awareness of pilots during the aircraft operation. Nowadays thermal sensors are readily available onboard as the part of the Enhanced Vision System (EVS). While video synthesized using 3D modeling (Synthetic Vision System, SVS) can be easily displayed on a Head-up Display (HUD) due to the presence of the area segmentation data, the projection of the EVS video on a HUD usually results in an image with large bright areas that partially obscure the cockpit view from the cabin crew. This paper is focused on the development of the ClearHUD algorithm for effective presentation of the EVS video on a HUD using the optical flow estimation. The ClearHUD algorithm is based on the optical flow estimation using the video from the SVS and the EVS. The difference of the optical flows is used to detect the obstacles. The areas of the detected obstacles are projected with high intensity, and the remaining regions are filtered using the segmentation from the SVS.
The ClearHUD algorithm was implemented in a prototype software for testing using 3D modeling. The optical flow for the SVS is estimated using ray tracing. The optical flow for the EVS is estimated using FlowNet 2.0 convolutional neural network (CNN). The evaluation of the ClearHUD algorithm has proved that it provides a significant increase of brightness of obstacles and reduces the intensity of non-informative areas.
A presence of an accurate dataset is the key requirement for a successful development of an optical flow estimation algorithm. A large number of freely available optical flow datasets were developed in recent years and gave rise for many powerful algorithms. However most of the datasets include only images captured in the visible spectrum. This paper is focused on the creation of a multispectral optical flow dataset with an accurate ground truth. The generation of an accurate ground truth optical flow is a rather complex problem, as no device for error-free optical flow measurement was developed to date. Existing methods for ground truth optical flow estimation are based on hidden textures, 3D modelling or laser scanning. Such techniques are either work only with a synthetic optical flow or provide a sparse ground truth optical flow. In this paper a new photogrammetric method for generation of an accurate ground truth optical flow is proposed. The method combines the benefits of the accuracy and density of a synthetic optical flow datasets with the flexibility of laser scanning based techniques. A multispectral dataset including various image sequences was generated using the developed method. The dataset is freely available on the accompanying web site.
Accurate egomotion estimation is required for mobile robot navigation. Often the egomotion is estimated using optical flow algorithms. For an accurate estimation of optical flow most of modern algorithms require high memory resources and processor speed. However simple single-board computers that control the motion of the robot usually do not provide such resources. On the other hand, most of modern single-board computers are equipped with an embedded GPU that could be used in parallel with a CPU to improve the performance of the optical flow estimation algorithm. This paper presents a new Z-flow algorithm for efficient computation of an optical flow using an embedded GPU. The algorithm is based on the phase correlation optical flow estimation and provide a real-time performance on a low cost embedded GPU. The layered optical flow model is used. Layer segmentation is performed using graph-cut algorithm with a time derivative based energy function. Such approach makes the algorithm both fast and robust in low light and low texture conditions. The algorithm implementation for a Raspberry Pi Model B computer is discussed. For evaluation of the algorithm the computer was mounted on a Hercules mobile skied-steered robot equipped with a monocular camera. The evaluation was performed using a hardware-in-the-loop simulation and experiments with Hercules mobile robot. Also the algorithm was evaluated using KITTY Optical Flow 2015 dataset. The resulting endpoint error of the optical flow calculated with the developed algorithm was low enough for navigation of the robot along the desired trajectory.
Skid-steered robots are widely used as mobile platforms for machine vision systems. However it is hard to achieve a stable motion of such robots along desired trajectory due to an unpredictable wheel slip. It is possible to compensate the unpredictable wheel slip and stabilize the motion of the robot using visual odometry. This paper presents a fast optical flow based algorithm for estimation of instantaneous center of rotation, angular and longitudinal speed of the robot. The proposed algorithm is based on Horn–Schunck variational optical flow estimation method. The instantaneous center of rotation and motion of the robot is estimated by back projection of optical flow field to the ground surface. The developed algorithm was tested using skid-steered mobile robot. The robot is based on a mobile platform that includes two pairs of differential driven motors and a motor controller. Monocular visual odometry system consisting of a singleboard computer and a low cost webcam is mounted on the mobile platform. A state-space model of the robot was derived using standard black-box system identification. The input (commands) and the output (motion) were recorded using a dedicated external motion capture system. The obtained model was used to control the robot without visual odometry data. The paper is concluded with the algorithm quality estimation by comparison of the trajectories estimated by the algorithm with the data from motion capture system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.