We continue studying distributed Q-learning algorithms. Earlier we have already considered two algorithms. In the first one [4], our model learned the probabilities of receiving a reward in a given range of values. In the second algorithm [5], we used a different approach to solving the problem. We trained the model to predict the reward level with a given probability. more...