Having developed a model or set of models, practitioners must migrate them to a live “production” environment where the models can generate ongoing inferences based on new data and trigger any downstream actions or alerts.
For most enterprises, this may mean that the model is wrapped within an enterprise application that is being used by humans to make decisions. For example, a manufacturing organization may embed AI inferences about top equipment within an AI-based reliability application that provides valuable clues to maintenance crews. Alternatively, the model could be embedded as a microservice within existing applications and business processes, or the algorithm’s outputs could be distributed to existing operational systems (for example, tuning setpoints for controllers).
Deploying a machine learning model to a production environment at scale requires close collaboration and communication between business and technical stakeholders.
Thought should be put into the data volumes required, the frequency of inferences needed, the number of end consumers of an application, and the impact to existing operational systems and business processes.
The production environment should meet the needs of the business problem at hand. If many end users require accessing insights simultaneously, be sure that the web-hosted environment can handle high traffic. If new predictions are required to make rapid decisions, test the inference service level agreements (SLAs) to be sure that the algorithms execute quickly enough to meet the business requirement.