There is an increasing demand for developing efficient self-driving cars in the vehicle industry. One of the key technologies that driverless systems rely on is Computer Vision, which is a complex scientific field that deals with how computers can be made to gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to automate tasks that the human visual system can do.
Deep Learning has enabled the field of Computer Vision to advance rapidly in the last few years. In this thesis I would like to discuss about one specific task in Computer Vision called as Semantic Segmentation. Even though researchers have come up with numerous ways to solve this problem, I will implement a particular architecture namely PSPNet, which uses a Fully Convolutional Neural Network Model for the task.
I will use the the largest publicly available self-driving dataset, called BDD100K to train the PSPNet.
Along with this, my purpose is to also provide some intuitive insights on the commonly used operations and terms in Convolutional Networks for Image understanding. Some of these include Convolution, Max Pooling, Receptive field, Up-sampling, Skip Connections, etc.