项目作者: aniket-agarwal1999

项目描述 :
Implementation of Mixture of Experts paper
高级语言: Python
项目地址: git://github.com/aniket-agarwal1999/Mixture_of_Experts.git
创建时间: 2020-02-28T23:06:38Z
项目社区:https://github.com/aniket-agarwal1999/Mixture_of_Experts

开源协议:

下载


Mixture of Experts

Introduction

This is a basic implementation of the paper and basically is a toy implementation of the Mixture of Experts algorithm.

So the model basically consist of various expert models which specialize at a particular task rather than a single model being good at that task.
And finally weights are assigned to the various experts using a gating network(kind of like attention) where more weight, as a result, is given to the expert good at the particular task in hand.

Running the code

The code has been tested for Python 3.7 and PyTorch v1.3.

For training the model

  • Clone the repository and go to the repo.
    ```
    python main.py —training True ### For training

python main.py —testing True ### For testing
```

  • Apart from this, the various hyperparameter flags can also be seen from the main.py file and can be tweaked accordingly.

Code structure

  • main.py: Specification of various hyperparameters used during training, along with checkpoint location specifications.

  • train.py: Script for training(along with validating) the model and contains the whole training procedure.

  • test.py: Script for testing the already trained model.

  • model.py: Contains the architecture of model and the backbone used.

  • utils.py: Contains the various helper functions along with function for getting dataset.

Further things to be done

  • I am still not able to completely get the EM algorithm specified in the paper for optimizing the weights, the reason for which has also been specified in the utils.py file.