Python package tempelate scaffold

Steps of usage

This folder is a template scaffold for those who are not familiar with software engineering. This structure is supposed to be used to publish packages in the context for either of Data Science or Machine learning to automate model deployment.

Clone the repo or fork it to your account, then navigate to the project folder cd ML_app_scaffold
Activate the virtualenv dev by running this command in the terminal source dev/bin/activate
Change sub_pckage_1 to your package name
install all dependencies pip install -rrequirements.txt
Add your information to the setup.py file
Add train and test data to the datasets folder inside the package folder
Update the MANIFEET.in file to your package name
Navigate to the sub_package_1 folder then run tox -r to build the package

You should see:
commands succeeded congratulations :)

You can add tests in the test folder in the dedicated package. However, you need to add the dependency in the tox.ini and requirements.txt files. for more details check out tox documentation
In case you want to create a CI/CD pipeline, the demo folder includes a basic configuration file. You need to add enviromental variables to your project configuration at the project setting at circleci.
In case you need to publish your model using Gemfury.io, you can use the script file publish_model.sh and of course add env variables to the project setting at the circleci.

Few things to consider

I structured the scaffold based on the OOP which seperate concerns of code.

processing folder conatains any scripts for data wrangeling, cleaning, or feature engineering.
trained_model folder contains any scripts dedicated to build model, tuning or any related scripts.
pipeline.py file contains all the procedures that should be done using the sklearn.pipeline
predict.py file dedicated for getting out the predictions
train_pipeline.py file dedicated to train the model, starting from downloading the dataset, split, apply pipeline…etc.

You are free to just delete them and put all your scripts in just one file - personal preferences.

Please note that the train_pipeline is commented out until you download the dataset, fill the config file with the correct variables. Otherwise, the tox command would fail because there are no variables to train.

The tests are not set yet on the scaffold - next step, pull requests are very welcome.

Output:

The resulted model is saved as a .pkl file versioned with the same version of the package.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT