项目作者: oneTimePad

项目描述 :
A framework for making RL easy!
高级语言: Python
项目地址: git://github.com/oneTimePad/advantage.git
创建时间: 2018-06-24T17:19:41Z
项目社区:https://github.com/oneTimePad/advantage

开源协议:

下载


advantage



Named after the RL “advantage” function, advantage is a TensorFlow-based Reinforcement Learning Framework. This framework allows for easy deployment of various RL algorithms, both discrete (i.e. Atari games) and continuous (i.e. Robotics) action-space models, with a little amount of coding. advantage is compatable with OpenAI Gym. Users can develop simulators using OpenAI Gym, and then simply using configuration files, train their models in the simulator. Trained models can then be easily deployed using TensorFlow protobufs. advantage’s goal is to implement the common paradigms of Reinforcement Learning to take advantage of code reuse when implementing models; this allows for new models to be added to the framework with ease.

Will be added to PyPI shortly

Currently supports:

  • Deep-Q Network

In the Works:

  • Moving from protobufs to gin-config
  • PPO, A3C, EPG, N-step Q, Soft-AC, and distributed training

Planned additions:

  • Value-based:
    • C51, Implicit Quantile Agents
  • Multi-agent

Dependencies

These are the tested dependencies. Although higher versions will probably work.

  1. tensorflow==1.10.0
  2. gym
  3. python3.5.2

Installing

  1. git clone https://github.com/oneTimePad/advantage.git
  2. export PYTHONPATH=$PYTHONPATH:{path_to_advantage_package}

Build protobufs

  1. {path_to_advantage}/scripts/build_protos.sh

Training

  1. import advantage as adv
  2. agent = adv.make("{path_to}/samples/dqn.config")
  3. agent.train()
  4. `

Inference

For Inference, the context manager infer opens up
an inference session.

  1. with agent.infer() as infer:
  2. env = infer.env
  3. for _ in infer.run_trajectory(run_through=False):
  4. env.render()

Open with .reuse() to
open a reusable inference session that isn’t closed
on exit.

  1. infer_session = agent.infer()
  2. with infer_session.reuse() as infer:
  3. env = infer.env
  4. for _ in infer.run_trajectory(run_through=False):
  5. env.render()

Samples

CartPole-v0 notebook

If there are any problems with the learning algorithm please open an issue