This MSc thesis introduces reinforcement learning which is a subset of machine learning. In this thesis I explain in details the theoretical basics of the area including the main features of the closely related concept of Markov decision processes, the algorithms of this scientific field and some of the important achievements of it. I demonstrate the application process of reinforcement learning for finding an applicable behavior policy in a stochastic environment by the help of two tasks which seemingly do not have much in common.
In the first chapter I present the theory of Markov decision processes and the scientific area based on it which is named reinforcement learning. I introduce algorithms of the area on which successful applications of reinforcement learning were based. I present a wide spectrum of applicability considering reinforcement learning.
The second chapter introduces two modeling problems whose stochastic nature suggests that utilization of reinforcement learning algorithms could be rewarding. The first problem to solve is to adapt the production of wind farms to their previously submitted forecasts. The second one is a search for portfolio management strategy considering transactional costs. I present the available information, the constraints of these areas and the possibilities for solutions. In this chapter I introduce one partial Markov decision process as a model for each problem. By the help of these models I point out the similarities of the tasks.
The third chapter is about planning and implementation. Alternatives for the missing parts of the Markov decision processes are also presented. I introduce algorithms to solve the previously presented tasks and modify them by means of consideration of the tasks-specific attibutes and the experiences noted in the works cited.
The fourth chapter consists of the evaluation of the previously delineated alternatives and the solution itself. The measures used for the evaluation are presented in detail. The most successful methods are evaluated using several different parameter settings. In this chapter I present a summary of the experiences and the occurring chances to improve the algorithms.
The aim of this thesis is to present the area of reinforcement learning along with the application of RL-methods on specific examples from industrial environment pointing out possibilities of exploitation of the similarities among models and the scalability of algorithms with task-specific information.