The scientific foundations at the end of the 20th century, with the increase in the computational capability available to us, had the opportunity to test in practice - significant gains were made in exploiting the flexibility of deep neural nets in reinforced learning.
One of the most important aspects of reinforced learning is the deeper method of discovering the environment.
To stimulate this, the most commonly used method is to enrich the action space with noise, which is a relatively simple and intuitive solution ("sometimes not the best option to choose!").
Another problem with this problem was - quite complex - a practical solution: Plappert et al. they have shown how in some cases the Gaussian noise mixed with the parameter space can overcome the existing solutions from the point of view of speed and convergence of discovery. In the deep reinforced learning system, parameters are the weights and other learning settings of the neural network, so noise is added to the decision process even before the end result.
In my thesis, I apply these innovations to Deep Q-Network (DQN) and Deep Deterministic Policy Gradients (DDPG), and note the effect of an algorithm's improvement, the so-called dueling, and, in the latter case, the (traditional) action field noise combining results.
As a result of the experiments, it may be noted that while in some cases the perturbation is aided, it may not necessarily be universally applicable, that is, not always or at least not in all ways supporting the desired discovery behavior.