A Markov Decision Process with large number of states and its solvers (value iteration, policy iteration and Q-learning)