This document serves as a guide for putting the ML model into production. Note that this process can be started whenever a prototype model is ready, even if you intend to make improvements.
The model will be retrained daily whenever there is a push to the main
branch of the dssquad-ml
repository. This automation is made possible through the use of GitHub Actions.
🙌 The four basic steps to put the model into production:
-
Convert your Jupyter Notebook into one more Python scripts for importing data, training, and inserting predictions into the database. This could be a single
train.py
file to keep it simple. -
Create a
requirements.txt
file containing the libraries and their versions needed to run the model. -
Create a GitHub Actions configuration file located in
.github/workflows/<filename>.yml
that instructs GitHub to run the training/prediction script whenever there is a push to themain
branch. -
Merge these assets into the
main
branch with a Pull Request.
Important: The model and supporting assets should be merged into the develop
branch (via pull request) first for testing and receiving feedback. Once approved, submit a pull request to merge the develop
branch into the main
branch.
For an example of what the production GitHub branch should look like, vist https://github.com/Data-Science-Squad/dssquad-ml/tree/dm_example_github_actions
1. Converting the Jupyter Notebook into Python script(s)
Once you have a working model, convert your Jupyter Notebook into one or more Python scripts for collecting data, training, inference, logging metrics, and inserting predictions in the database. A simple solution is to create a single Python script called train.py
to perform all of these steps. This file should contain only the essentials for running your model.
# train.py pseudo-code
import <libraries>
read_data()
train()
predict()
log_metrics()
insert_predictions_into_db()
2. Create a requirements.txt
file with your libraries
This file simply contains all necessary libraries and their versions needed to run train.py
.
# requirements.txt example
neptune-client==0.4.130
neptune-contrib==0.27.0
numpy==1.19.0
scikit-learn==0.23.1
mysql-connector-python==8.0.23
mysqlclient==2.0.3
3. Create a GitHub Actions configuration file
This file will enable automation of the model. An example file is located here.