项目作者: zmaqutu

项目描述 :
This is an implementation of two fundamental reinforcement learning algorighms: Value Iteration and Q-Learning in Python.
高级语言: Python
项目地址: git://github.com/zmaqutu/machine-learning-reinforcement-learning.git


Machine Learning Reinforcement Learning






Table of Contents

  • Project Setup
  • Libraries Used
  • Future Scope

    Description

    This is an implementation of two fundamental reinforcement learinging algorighms: Value Iteration and Q-Learning. The numpy library is also used to
    generate random choices with given probabilites simulating explorative and exploitative behaviour.

Value Iteration Demo

Here our agent is fully aware of its surroundings, acting in a fully observable Markov Decision Process (MDP)

Q-Learning Demo

Here our agent knows very little about its surroundings. In this algorithm the agent explores the gird world in a series of both random actions (explorative behaviour) and greedy actions (exploitative behaviour) until some path can be found.
The agent acts in a partially observable Markov Decision Process

Project setup

To run this project clone this repository in a folder on your local machine.
We first need to build our virtual environment and install a list of
libraries our program needs to run. To do this, open a terminal in the root directory and run the following commands

  1. make install // installs program dependencies

Next we need to activate our virual environment. To do this run the following commands

  1. source venv/bin/activate // Activates our virtual environment

Now we can run either one of our algorithms. Run these commands and you will be provided with sample inputs to your file for each of the
two algorithms

  1. make runSampleValueIteration // runs ValueIteration.py with sample arguments

or

  1. make runSampleQLearning //runs QLearning.py with sample arguments

Alternatively you can run each program program with your own arguments that follow the pattern

  1. python3 QLearning.py [-start startx starty] [-end endx endy] [-k numberofMines] [-gamma gamma]")
  2. [-epochs epochs] [-learningRate learningRate]

To exit the virtual environment run:

  1. deactivate // runs the program

Libraries Used

  • Numpy

Future Scope

Made with ❤️ with Pycharm and vim