Sensing and modelling of the surrounding environment is crucial for solving many of the problems in intelligent machines like self-driving cars, autonomous robots, and augmented reality displays. Performance, reliability and safety of the autonomous agents rely heavily on the way the environment is modelled. Two-dimensional models are inadequate to capture the three-dimensional nature of real-world scenes. Three-dimensional models are necessary to achieve the standards required by the autonomy stack for intelligent agents to work alongside humans. Data driven deep learning methodologies for three-dimensional scene modelling has evolved greatly in the past few years because of the availability of huge amounts of data from variety of sensors in the form of well-designed datasets. 3D object detection and localization are two of the key requirements for tasks such as obstacle avoidance, agent-to-agent interaction, and path planning. Most methodologies for object detection work on a single sensor data like camera or LiDAR. Camera sensors provide feature rich scene data and LiDAR provides us 3D geometrical information. Advanced object detection and localization can be achieved by leveraging the information from both camera and LiDAR sensors. In order to effectively quantify the uncertainty of each sensor channel, an appropriate fusion strategy is needed to fuse the independently encoded point clouds from LiDAR with the RGB images from standard vision cameras. In this work, we introduce a fusion strategy and develop a multimodal pipeline which utilizes existing state-of-the-art deep learning based data encoders to produce robust 3D object detection and localization in real-time. The performance of the proposed fusion model is evaluated on the popular KITTI 3D benchmark dataset.
Library of Congress Subject Headings
Multisensor data fusion; Machine learning; Computer vision; Pattern recognition systems; Three dimensional imaging
Computer Engineering (MS)
Department, Program, or Center
Computer Engineering (KGCOE)
Bhanushali, Darshan Ramesh, "Multi-Sensor Fusion for 3D Object Detection" (2020). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus