项目作者: schatty

项目描述 :
PyTorch implementation of Distributed Distributional Deterministic Policy Gradients (https://arxiv.org/abs/1804.08617)
高级语言: Python
项目地址: git://github.com/schatty/d4pg-pytorch.git
创建时间: 2019-07-09T06:14:42Z
项目社区:https://github.com/schatty/d4pg-pytorch

开源协议:

下载


oprl_logo

OPRL

A Modular Library for Off-Policy Reinforcement Learning with a focus on SafeRL and distributed computing. Benchmarking resutls are available at associated homepage: Homepage

Code style: black

Disclaimer

The project is under an active renovation, for the old code with D4PG algorithm working with multiprocessing queues and mujoco_py please refer to the branch d4pg_legacy.

Roadmap 🏗

  • Switching to mujoco 3.1.1
  • Replacing multiprocessing queues with RabbitMQ for distributed RL
  • Baselines with DDPG, TQC for dm_control for 1M step
  • Tests
  • Support for SafetyGymnasium
  • Style and readability improvements
  • Baselines with Distributed algorithms for dm_control
  • D4PG logic on top of TQC

Installation

  1. pip install -r requirements.txt
  2. cd src && pip install -e .

For working with SafetyGymnasium install it manually

  1. git clone https://github.com/PKU-Alignment/safety-gymnasium
  2. cd safety-gymnasium && pip install -e .

Usage

To run DDPG in a single process

  1. python src/oprl/configs/ddpg.py --env walker-walk

To run distributed DDPG

Run RabbitMQ

  1. docker run -it --rm --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3.12-management

Run training

  1. python src/oprl/configs/d3pg.py --env walker-walk

Tests

  1. cd src && pip install -e .
  2. cd .. && pip install -r tests/functional/requirements.txt
  3. python -m pytest tests

Results

Results for single process DDPG and TQC:
ddpg_tqc_eval

Acknowledgements