Play Tic-Tac-Toe against a reinforcement learning agent that just learned to play through temporal difference learning