This article continues our look at reinforcement learning by considering another algorithm, namely the Monte-Carlo. This algorithm is very similar and in fact arguably encompasses both Q-Learning and SARSA in that it can be either on-policy or off-policy. What sets it apart though is the emphasis on episodes. These simply are a way of batching the reinforcement learning cycle updates, that we introduced in this article, such that the updating of the Q-Values of the Q-Map happens less frequently. ...