项目作者: davidmrau

项目描述 :
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
高级语言: Python
项目地址: git://github.com/davidmrau/mixture-of-experts.git
创建时间: 2019-07-19T14:21:22Z
项目社区:https://github.com/davidmrau/mixture-of-experts

开源协议:GNU General Public License v3.0

下载


The Sparsely Gated Mixture of Experts Layer for PyTorch

source: https://techburst.io/outrageously-large-neural-network-gated-mixture-of-experts-billions-of-parameter-same-d3e901f2fe05

This repository contains the PyTorch re-implementation of the sparsely-gated MoE layer described in the paper Outrageously Large Neural Networks for PyTorch.

  1. from moe import MoE
  2. import torch
  3. # instantiate the MoE layer
  4. model = MoE(input_size=1000, output_size=20, num_experts=10,hidden_size=66, k= 4, noisy_gating=True)
  5. X = torch.rand(32, 1000)
  6. #train
  7. model.train()
  8. # forward
  9. y_hat, aux_loss = model(X)
  10. # evaluation
  11. model.eval()
  12. y_hat, aux_loss = model(X)

Requirements

To install the requirements run:

pip install -r requirements.py

Example

The file example.py contains a minimal working example illustrating how to train and evaluate the MoE layer with dummy inputs and targets. To run the example:

python example.py

CIFAR 10 example

The file cifar10_example.py contains a minimal working example of the CIFAR 10 dataset. It achieves an accuracy of 39% with arbitrary hyper-parameters and not fully converged. To run the example:

python cifar10_example.py

Used by

FastMoE: A Fast Mixture-of-Expert Training System This implementation was used as a reference PyTorch implementation for single-GPU training.

Acknowledgements

The code is based on the TensorFlow implementation that can be found here.

Citing

  1. @misc{rau2019moe,
  2. title={Sparsely-gated Mixture-of-Experts PyTorch implementation},
  3. author={Rau, David},
  4. journal={https://github.com/davidmrau/mixture-of-experts},
  5. year={2019}
  6. }