Autonomous vehicles are present in agriculture environments, where safety is an important aspect. To prevent collisions, depth estimation is one possible way to know the position and the distance of the obstacles around the vehicle. In this thesis I present a real-time depth estimation architecture for monocular agriculture videos.
The work of Zhou et al "An unsupervised learning framework for depth and ego-motion estimation from monocular videos" gives the baseline architecture. In contrast with the original Zhou net, my modified architecture is less complex. I try out different versions of modifications to see what level of complexity is necessary. The approach is completely unsupervised, only monocular images and the camera matrices are necessary for the training process. I preprocess the agriculture dataset, categorize the videos and generate depth ground-truth data with camera and lidar sensor fusion. I test the modified algorithms offline on agriculture test set. With the less complex versions, I am able to predict depth maps as accurate as the original network. The visual results are also as good as what the original architecture predicts. Moreover, the number of training parameters, the training time and the inference time are decreased.