This document serves as a guide for putting the ML model into production. Note that this process can be started whenever a prototype model is ready, even if you intend to make improvements.
The model will be retrained daily whenever there is a push to the
main branch of the
dssquad-ml repository. This automation is made possible through the use of GitHub Actions.
🙌 The four basic steps to put the model into production:
Convert your Jupyter Notebook into one more Python scripts for importing data, training, and inserting predictions into the database. This could be a single
train.pyfile to keep it simple.
requirements.txtfile containing the libraries and their versions needed to run the model.
Create a GitHub Actions configuration file located in
.github/workflows/<filename>.ymlthat instructs GitHub to run the training/prediction script whenever there is a push to the
Merge these assets into the
mainbranch with a Pull Request.
Important: The model and supporting assets should be merged into the
develop branch (via pull request) first for testing and receiving feedback. Once approved, submit a pull request to merge the
develop branch into the
For an example of what the production GitHub branch should look like, vist https://github.com/Data-Science-Squad/dssquad-ml/tree/dm_example_github_actions
1. Converting the Jupyter Notebook into Python script(s)
Once you have a working model, convert your Jupyter Notebook into one or more Python scripts for collecting data, training, inference, logging metrics, and inserting predictions in the database. A simple solution is to create a single Python script called
train.py to perform all of these steps. This file should contain only the essentials for running your model.
# train.py pseudo-code import <libraries> read_data() train() predict() log_metrics() insert_predictions_into_db()
2. Create a
requirements.txt file with your libraries
This file simply contains all necessary libraries and their versions needed to run
# requirements.txt example neptune-client==0.4.130 neptune-contrib==0.27.0 numpy==1.19.0 scikit-learn==0.23.1 mysql-connector-python==8.0.23 mysqlclient==2.0.3
3. Create a GitHub Actions configuration file
This file will enable automation of the model. An example file is located here.