We continue to explore reinforcement learning methods. In previous articles we discussed methods for approximating the Q-learning Reward function and the policy gradient function learning. Each method has its own advantages and disadvantages. It would be great to use the maximum of their advantages when building and training models. When trying to find methods minimizing the shortcomings of the algorithms used, we often try to build certain conglomerates ...