Object detectors on autonomous systems often have to contend with dimly-lit environments and harsh weather conditions. RGB images alone typically do not provide enough information. Because of this, autonomous systems have an array of other specialized sensors to observe their surroundings. These sensors can operate asynchronously, have various effective ranges, and create drastically different amounts of data. An autonomous platform must be able to combine the many disparate streams of information in order to leverage all of the available information while creating the most comprehensive model of its environment. In addition to multiple sensors, deep learning-based object detectors typically require swaths of labeled data to achieve good performance. Unfortunately, collecting multimodal, labeled data is exceedingly labor-intensive which necessitates a streamlined approach to data collection. The use of video game graphics engines in the production of images and video has emerged as a relatively cheap and effective way to create new datasets. This helps to close the data gap for computer vision tasks like object detection and segmentation. Another unique aspect of using gaming engines to generate data is the ability to introduce domain randomization which randomizes certain parameters of the game engine and generation scheme in order to improve generalization to real-life data. In this paper, we outline the creation of a multi-modal dataset using domain randomization. Our dataset will focus on the two most popular sensors in autonomous vehicles, LiDAR and RGB cameras. We will perform baseline testing of an object detector using a data-fusion deep learning architecture on both our synthetic dataset and the KITTI dataset for comparison.
|