Reinforcement learning is a branch of machine learning. It is a general framework devoted to solving sequential decision problems in potentially unknown environments by finding the optimal way to make decisions. It is concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The problem, due to its generality, is studied in many other disciplines, such as economics, psychology and neuroscience.
Historically, applications of reinforcement learning have been confined to environments with a small number of well-defined states and actions due to memory and computational constraints. Recent advances in hardware technology and deep learning have made it possible to apply these methods to highly complex environments as well. Deep reinforcement learning has been used to beat the best players in the world in the game of Go, to learn from visual input, and to solve three-dimensional locomotion tasks.
Research in the field has mainly focused on improving the training and optimization methods. Although one of the most significant applications of deep learning is image recognition, there has been little research published that analyzes the image processing part of the agent models that operate on visual input. Deep architectures and regularization methods used in the computer vision literature tend to perform poorly in the reinforcement learning setting, and structural modifications as a way of improving performance have been mostly sidelined.
In this paper, I study how altering the model structure of a deep neural network agent that observes its environment visually affects its performance in the reinforcement learning setting. After introducing the background literature, I show how the advantage actor critic reinforcement learning method performs in multiple visually observable environments using one of the most commonly used agent models. I then analyze how changes made to this model alter the agent’s performance.