Object detection is a critical task in autonomous driving. The autonomous cars are usually equipped with multiple sensors such as camera and LiDAR.
The 2D object detection networks determine the object's location based on the image from the camera. Many researchers have proposed systems with high performance. Although convolutional neural networks are the state of the art techniques for 2D object detection, they do not perform well on 3D point cloud due to the sparse sensor data. Therefore, new methods are needed. The 3D object detection networks work on the 3D point cloud provided by a range distance sensor. Some solutions combine the features from images and the point cloud. In this thesis, the LiDAR-based networks are in focus, for example VoxelNet. VoxelNet is an end-to-end network that combines feature extraction and bounding box prediction. This network works directly on 3D point cloud data. VoxelNet divides the 3D space into voxels, transforms the voxels into a matrix representation which encodes the point interaction within a voxel. A sequence of convolutional neural networks extracts the multiple features and generates 3D bounding boxes. For the implementation, I trained the VoxelNet network on the KITTI car benchmark. The predicted labels are evaluated in the bird' s-eye view and 3D detection as well. The primary performance evaluation metric is the average precision.