So let's look at machine learning from an operations perspective. This means taking a whole system view from defining the problem to scaling the solution. We saw that there are two phases in machine learning, a training phase and an inference phase. We learned that an ML problem can be thought of as being all about data. Then we caution to you that the machine learning part of an ML problem can be quite small. Finally, we looked at a use case from the perspective of data, software, and machine learning. Now let's reexamine machine learning like an operations person. From this perspective, you consider the whole system through the lenses of time, resources, and quality. How do you reduce the time between analyzing their problem, creating the models, and deploying the solution, while maintaining the quality of the output? In software engineering, this approach is called DevOps. We can borrow that term in machine learning and call it MLOps. Think of MLOps as a lifecycle management discipline for machine learning. Its goal is a balanced process full approach to the management of resources, data, code, time, and quality to achieve business objectives and meet regulatory concerns. Some of the concepts from DevOps translate directly to MLOps. When software developers work on a project, they don't all work on the same code at once. Instead, they check out the code they intend to work on from a coach safe, then merge it back when their task is finished. Before the code is returned to the safe, the developer checks that nothing has changed in the main version. And then unit tests there updates before merging the code back together. Frequent merges with the main code reduce the chance of divergence. This process is called continuous integration or CI. In a busy development team, this happens tens of times a day. Another process favored by developers is continuous delivery. This is a method for building, testing, and releasing software in short cycles. When this process is followed, the main development code is almost always production ready, and can be released into the live environment at anytime. If it is not followed, the main code is like a race car with its wheels off and its engine out. It can go fast but only once it's put back together. Continuous development can be done either manually or automatically. Continuous integration of source code, unit testing, integration testing, and continuous delivery of the software to production are important processes in machine learning operations too. But there is another important aspect of MLOps. That's right, data. Unlike conventional software that can be relied on to do the same thing every time, and ML model can go off. By this, we mean that its predictive power wanes as data profiles change, which they inevitably do. So we can build on continuous integration and continuous deployment and introduce a new term, continuous training. Continuous training is the process of monitoring, measuring, retraining, and serving the models. MLOps differs from DevOps in important ways too. Continuous integration still involves testing and validating code and components, but now it also involves testing and validating data, data schemas, and models. It is no longer about a single software package or a service, but about a system. The ML training pipeline that should automatically deploy another service. The model prediction service. Uniquely, ML is also concerned with automatically monitoring, retraining, and serving the models. Another concept that transfers well from software development to machine learning is technical debt. Software developers are familiar with the time, resources, and quality tradeoffs. They talk about technical debt, for example, the backlog of rework that bills up. Because sometimes they have compromised on quality in order to develop code quickly. They understand that although there were good reasons to do this, they will have to go back and fix things later. This is an engineering version of the common concept. Putting things off until tomorrow, what is better done today? There is a price to pay for that common concept. Machine learning has been described as the high interest credit card of technical debt. This means that it can be relatively fast and cheap to develop and deploy an email system, but maintaining it overtime can be difficult and expensive. The real challenge isn't building an ML model, it's building an integrated ML system and continuously operating it in production. Just like a high interest credit card, the technical debt with machine learning compounds and can be incredibly expensive and difficult to pay down. Machine learning systems can be thought of as a special type of software system. Operationally, they have all the challenges of software development, plus a few of their own. Some of these include multi-functional teams. An ML project will have developers and data scientists working on data analysis, model development, and experimentation. Multi-functional teams can create their own management challenges, another is experimental nature of the process. By definition, machine learning is experimental in nature. You must constantly try new approaches with the data, the models, and parameter configurations. The challenge is tracking what worked and what didn't, and maintaining reproducibility while maximizing code re-usability. Another is more complex testing. Testing an ML system is more involved than testing other software systems because you are validating data, parameters, and code together in a system instead of unit testing. Another is more complex deployment. In ML systems, deployment isn't as simple as deploying in offline trained ML model as a prediction service. ML systems can require you to deploy a multistep pipeline to automatically re-train and deploy models. Another is model decay. Data profiles constantly change. If something changes in the data input, the predictive power of the model in production may change with it. Therefore, you need to track summary statistics of your data and monitor the online performance of your model to send notifications or rollback when values deviate from your expectations. There are many other reasons why technical debt bills up in an ML system. In any ML project after you define the business use case and established these success criteria, the process of delivering and ML model to production involves the following steps. These steps can be completed manually or can be completed by an automated pipeline. For example, data extraction, data analysis, data preparation, model training, model evaluation, model validation, model serving, and model monitoring. In the extraction, analysis, and data preparation stages, you perform exploratory data analysis or EDA to understand the available data for building the ML model. This process leads to understanding the data schema and characteristics that are expected by the model and identifying the data preparation and feature engineering that are needed for the model. In effect, you are preparing the data for the ML task. This preparation involves data cleaning where you split the data into training, validation, and test sets. You also apply data transformations and feature engineering to the model that solves the target task. The outputs of this step or the data splits in the prepared format. In general, simple models on large datasets produced better results than complex models and small datasets. Simplicity is created by identifying patterns and connections in the raw data during the analysis and preparation phases. This improves the power of the ML model to make predictions later. These patterns and connections are called features, and identifying them and working with them, it's called feature selection and feature engineering. Feature engineering is an art form in machine learning. Have you ever looked at a work of modern art in thought? I could have done that. In fact, the artist may have worked for many year. Here's to make it look that simple. They will have learned to draw and paint from first principles before creating their own artistic abstraction. It is the same with data science. Skilled data scientists create simple models. For example, you might have a traffic data sensor that records every time a vehicle passes. And that record is a date, and time stamp. A skilled data scientists might identify that the causal factor in traffic patterns is not the date nor the time, but the hour of the day. And abstract that feature from the time stamp. So the model will learn better if fed the hour of the day rather than the timestamp. Done correctly, feature engineering increases the power of ML models. Now that you have extracted, analyzed and prepared the data, you can implement different algorithms to train your model. This involves tuning the parameters to get the best results. Remember that validation data set you held back? Well, now you define a set of metrics to assess the quality of the model and validate these using the holdout data. If the models predicted performance is better than the defined baseline. It is ready for deployment through a target environment. The validated model is deployed to a target environment to serve predictions. This deployment can be one of the following. Microservices with the rest API to serve online predictions, and embedded model to an edge or mobile device, or part of a batch prediction system. Finally, you constantly monitor the predictive performance of the model in production in order to know whether another cycle of the ML process is needed. The level of automation of these steps defines the maturity of the ML process. Which reflects the philosophy of training new models given new data or training new models given new implementations. Many ML professionals build and deploy their ML models manually. We call this maturity level zero. Other data scientists perform continuous training of their models by automating the ML pipeline. This is maturity level one. Finally, the most mature system has completely automated and integrated the ML training, validation, and deployment phases. This is maturity level two. You will probably begin from first principles at maturity level zero. This is a good place to start. To summarize, machine learning is about much more than training a model and moving it to production. You need to build and deploy an ML pipeline with seamless integration between the development and live environments. These system you setup needs to allow for the monitoring of the model's predictive power and for the manual or automatic retraining, and deployment of new models. Setting up a continuous integration, continuous deployment and continuous training system will enable you to cope with rapid changes in your data and business environment. It is perfectly possible to eliminate all of this as manual processes. But you will improve your operational efficiency by gradually introducing automation.