With the emergence of advanced 2D and 3D sensors such as high-resolution visible cameras and less expensive lidar sensors, there is a need for a fusion of information extracted from senor modalities for accurate object detection, recognition, and tracking. To train a system with data captured by multiple sensors the regions of interest in the data must be accurately aligned. A necessary step in this process is a fine, pixel-level registration between multiple modalities. We propose a robust multimodal data registration strategy for automatically registering the visible and lidar data captured by sensors embedded in aerial vehicles. The coarse registration of the data is performed by utilizing the metadata, such as timestamps, GPS, and IMU information, provided by the data acquisition systems. The challenge is these modalities contain very different sets of information and are not able to be aligned using classical methods. Our proposed fine registration mechanism employs deep-learning methodologies for feature extraction of data in each modality. For our experiments, we use a 3D geopositioned aerial lidar dataset along with the visible data (coarsely registered) and extracted SIFT-like features from both of the data streams. These SIFT features are generated by appropriately trained deep-learning algorithms.
|