Implementation of Mixture of Experts paper
This is a basic implementation of the paper and basically is a toy implementation of the Mixture of Experts algorithm.
So the model basically consist of various expert models which specialize at a particular task rather than a single model being good at that task.
And finally weights are assigned to the various experts using a gating network(kind of like attention) where more weight, as a result, is given to the expert good at the particular task in hand.
The code has been tested for Python 3.7
and PyTorch v1.3
.
For training the model
python main.py —testing True ### For testing
```
main.py
file and can be tweaked accordingly.main.py
: Specification of various hyperparameters used during training, along with checkpoint location specifications.
train.py
: Script for training(along with validating) the model and contains the whole training procedure.
test.py
: Script for testing the already trained model.
model.py
: Contains the architecture of model and the backbone used.
utils.py
: Contains the various helper functions along with function for getting dataset.
utils.py
file.