项目作者: vincentberaud

项目描述 :
Deep Recurrent Q-Learning vs Deep Q Learning on a simple Partially Observable Markov Decision Process with Minecraft
高级语言: Jupyter Notebook
项目地址: git://github.com/vincentberaud/Minecraft-Reinforcement-Learning.git


Minecraft-Reinforcement-Learning

We here compare Deep Recurrent Q-Learning and Deep Q-Learning on two simple missions in a Partially Observable Markov Decision Process (POMDP) based on Minecraft environment.
We use gym-minecraft which allows the use of the MalmoProject with an OpenAI like API.

Our work is in the notebook DRQN_vs_DQN_minecraft.ipynb.

Our paper can be found here.

Work realised in collaboration with :

Prerequisites

  • Python 3.6
  • Jupyter
  • Tensorflow

Installation

  • You need to install Malmö
  • You can then install gym-minecraft
  • You can find in the folder “envs” :
    • The slightly modified version of gym-minecraft main code we used named minecraft.py. Put it in
      your_pip_folder/site-packages/gym_minecraft-0.0.2-py3.6.egg/gym-minecraft/envs/
  • The missions we used. Put them in
    your_pip_folder/site-packages/gym_minecraft-0.0.2-py3.6.egg/gym-minecraft/assets/

Models

You can choose between 3 models :

  • Simple DQN : Convolutional Neural Network with the current frame
    CNN architecure
  • DQN : Convolutional Neural Network with the last 4 frames
    StackedCNN architecure
  • DRQN : Convolutional Neural Network + LSTM layer
    DRQN architecure

DQN settings

  • Implementation of Double Q Learning
  • ε-greedy exploration
  • Experience replay iplementation

Note

Unlike Deepmind’s implementations of DQN for Atari games, Minecraft has the constraint that the game isn’t in pause during two actions ordered by the agent. Accordingly the agent and the network have to be as fast as needed to play in the range of time fixed in the environment.

Credits

We would like to thank Arthur Juliani for all his work and medium articles. Tambet Matiisen for his nice implementation of Gym-Minecraft.

References