KEYWORDS: 3D modeling, Image segmentation, 3D image processing, 3D image reconstruction, Visual process modeling, Systems modeling, 3D metrology, Performance modeling, Clouds, Neural networks
imultaneous 3D scene reconstruction and semantic segmentation are required in many applications such as autonomous driving, robotics, and optical metrology. Classic 3D reconstruction methods usually perform such operations twofold. Firstly, a 3D scanner or laser scanner acquires a point cloud. Secondly, semantic segmentation of the point cloud is performed. Recently a new kind of 3D model representation was proposed that utilizes the trapezium-shaped voxels that are aligned with the camera’s frustum and pixels [1]. Frustum voxel models proved to be effective for monocular 3D scene reconstruction and segmentation from monocular images [2]. Still, many existing 3D scanning systems readily provide stereo cameras. The performance of frustum voxel model-based methods for stereo input remains an open question. This paper is focused on the evaluation of the 3D reconstruction quality of a volumetric neural network with a monocular and stereo input. We leverage an SSZ [2] volumetric neural network as a starting point for our research. We develop its modified version that we term Stereo-SSZ that receives a stereo pair as an input. We compare the performance of the original SSZ model and our Stereo-SSZ model on different real and synthetic 3D shape datasets. Specifically, we generate a stereo version of the SemanticVoxels [2] dataset and capture stereo pairs of multiple real objects using a structured light scanner. The results of our experiments are encouraging and demonstrate that the model with a stereo input outperforms the original monocular SSZ network. Specifically, the frustum voxel models generated by our Stereo-SSZ model have lower surface distance errors and demonstrate fine details in the reconstructed 3D models.
KEYWORDS: 3D modeling, 3D image processing, Sensors, Data modeling, 3D image reconstruction, Reconstruction algorithms, Convolutional neural networks, Detection and tracking algorithms
Objects without prominent textures pose challenges for an automatic 3D model reconstruction and feature point matching. Such objects are common in many industrial applications such as metal defect detection, archeological applications of photogrammetry, and 3D object reconstruction from infrared imagery. Most of the common feature point descriptors fail to match local patches in featureless regions of an object. Various kind of textures requires different feature descriptors for high-quality image matching. Hence, automatic low-textured 3D object reconstruction using Structure from Motion (SfM) methods is challenging. Nevertheless, such reconstruction is possible with the aid of a human operator. Deep learning-based descriptors have outperformed most of common feature point descriptors recently. This paper is focused on the development of a new conditional generative adversarial auto-encoder (GANcoder) based on the deep learning. We use a coder-decoder architecture with four convolutional and four deconvolutional layers as a staring point for our research. Our main contribution is a generative adversarial framework GANcoder for training the auto-encoder on the textureless data. Traditional training approaches using an L1 norm tend to converge to the mean image on the low-textured images. In contrast, we use an adversarial discriminator to provided an additional loss function that is focused on distinguishing real images from the training dataset from the auto-encoder reconstruction. We collected a large GANPatches dataset of feature points from nearly textureless objects to train and evaluate our model and baselines. The dataset includes 16k pairs of image patches. We performed qualitative evaluation of our GANcoder and baselines for two tasks. Firstly, we compare the matching score of the our GANcoder and baselines. Secondly, we evaluate the accuracy of 3D reconstruction of low-textured objects using an SfM pipeline with stereo-matching provided by our GANcoder. The results of the evaluation are encouraging and demonstrate that our model achieves and surpasses the state of the art in the feature matching on low-textured objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.