This document serves as a guide for all team members as we start to build out production assets and the production automation pipeline.
High-level overview
The following diagram presents a high-level overview of the production automation lifecycle.
Each day, the datakit
codebase will collect new incidents from the source API and update the database with production data assets. Then, the dssquad-ml
codebase will retrain the model on fresh data and insert forecasts into the database. Finally, the dssquad-app
codebase will build and deploy a new version of the Streamlit app using all up-to-date production data assets.
All of this will happen via automation.
Basic production requirements for each team
Data Engineering - Requirements
Machine Learning - Requirements
Web application - Requirements
Automating workflows with GitHub Actions
GitHub Actions will make it easy to automate and orchestrate the execution of our code to build and deploy our software. For our purposes, we will use GitHub Actions to trigger the execution of production code across multiple repositories sequentially to automate our daily builds.
GitHub Actions runs workflows to execute code. A workflow is user-defined and contained in a .yaml
configuration file (example) in the repository. Workflows contain a series of instructions, such as installing Python and third-party libraries, that enable the execution of code in an isolated virtual environment.
Events are web-based actions that trigger a workflow to run. Common event types include:
schedule
, i.e. the workflow on a schedule defined by a cron expressionpush
, i.e. the workflow is triggered whenever a commit is pushedworflow_dispatch
, e.g. the completion of a workflow in one repository triggers the execution of a workflow in another
Using GitHub Actions for our project
For our project, we will use the following workflows and events to automate our daily builds:
-
The
dssquad-datakit
workflow is triggered by aschedule
event at [insert time] each day. -
Upon completion, the
dssquad-datakit
workflow sends aworflow_dispatch
event to thedssquad-ml
workflow for execution. -
Upon completion, the
dssquad-ml
workflow sends aworflow_dispatch
event to thedssquad-app
repository for execution.