Requirements to productionize the ML model

Danny Morris 2021-03-04 2 min read

This document serves as a guide for putting the ML model into production. Note that this process can be started whenever a prototype model is ready, even if you intend to make improvements.

The model will be retrained daily whenever there is a push to the main branch of the dssquad-ml repository. This automation is made possible through the use of GitHub Actions.

🙌 The four basic steps to put the model into production:

Convert your Jupyter Notebook into one more Python scripts for importing data, training, and inserting predictions into the database. This could be a single train.py file to keep it simple.
Create a requirements.txt file containing the libraries and their versions needed to run the model.
Create a GitHub Actions configuration file located in .github/workflows/<filename>.yml that instructs GitHub to run the training/prediction script whenever there is a push to the main branch.
Merge these assets into the main branch with a Pull Request.

Important: The model and supporting assets should be merged into the develop branch (via pull request) first for testing and receiving feedback. Once approved, submit a pull request to merge the develop branch into the main branch.

For an example of what the production GitHub branch should look like, vist https://github.com/Data-Science-Squad/dssquad-ml/tree/dm_example_github_actions

1. Converting the Jupyter Notebook into Python script(s)

Once you have a working model, convert your Jupyter Notebook into one or more Python scripts for collecting data, training, inference, logging metrics, and inserting predictions in the database. A simple solution is to create a single Python script called train.py to perform all of these steps. This file should contain only the essentials for running your model.

# train.py pseudo-code
import <libraries>
read_data()
train()
predict()
log_metrics()
insert_predictions_into_db()

2. Create a `requirements.txt` file with your libraries

This file simply contains all necessary libraries and their versions needed to run train.py.

# requirements.txt example

neptune-client==0.4.130
neptune-contrib==0.27.0
numpy==1.19.0
scikit-learn==0.23.1
mysql-connector-python==8.0.23   
mysqlclient==2.0.3

3. Create a GitHub Actions configuration file

This file will enable automation of the model. An example file is located here.

DS Squad Project Wiki

Requirements to productionize the ML model

1. Converting the Jupyter Notebook into Python script(s)

2. Create a requirements.txt file with your libraries

3. Create a GitHub Actions configuration file

2. Create a `requirements.txt` file with your libraries