After the data is prepared, we're ready to start training our model. If we're using a TensorFlow, Xgboost, or Scikit learn model, we can train directly on AI Platform. As a note, currently we can only use the operator with a training op packaged as a Python package with the ML Engine training operator. You can use other options for training with containerized training ops, such as the GKE pod operator. After our training op is packaged, we're good to go. The training arguments will need to be passed in as a list in a very specific form. Each argument we will want to pass to AI platform will correspond to two consecutive elements of the list. The first element will give the name of the argument as a batch flag. The second element will be the value of the argument. In the example here, we include the job-dir, output_dir, log_dir, train_data_path, and eval_data_path. All arguments that are trainer package expects to run the training mode. Now we're ready to create our training task. We will use the ML engine training operator with the building project, job ID, the package location or URI, the Python module to be invoked, trainer. task, the arguments needed by trainer. task, and then the information for the AI platform training job itself. The region where resources will be hosted, the type of cluster we will train on, the version of TensorFlow we will use, and the version of Python that is expected. Now that we have a newly trained model, we want to ensure that it meets our goals for performance. We will capture the evaluation metrics of the newly trained model and store those somewhere easily accessible. For example, we can write them out to a table in BigQuery, with the version name, date of training, and other information we want to track. We could also write it out to Cloud Storage, perhaps in JSON format. If we've reported the metrics to a BigQuery table, what can we do with it? Well, we could use a BigQuery value check operator to ensure that it's within a certain tolerance of our previous models performance, or even within some preset expectations. We could have a branch of the pipeline that will alert us, perhaps via Pub/Sub, if the train model does not pass the test. This also allows us to analyze our model performance over time and possibly catch the kink early before it becomes a problem. Finally, note the use of the variable. get method. This allows us to retrieve global variables from our airflow instance. There is an analogous variable. set method for setting the values of the variables from within a Python operator. After we evaluate our model, we are ready to deploy it. For this example, we will deploy the model to serve predictions on AI platform. We will create a new model with the ML Engine model operator by specifying the building project and model name and then using the CREATE OPERATION. Recall that a model on AI platform is not the train model itself, but instead a folder of sorts, where we will deploy our models as versions. In practice, this model may already exist. We could use an operator to list the existing models and a branch operator. Branch Python operator, for example, to choose a different task, say a dummy operator, if the model already exists. In this case, the CREATE MODEL TASK will be skipped because it is not needed. After our model is created in AI platform, where we skipped the task because we verified that the model already exist, we're ready to create a new version. Here we used a version name variable. We mentioned earlier to set the version name in AI platform. Note that we could update this version name variable as part of the model evaluation process after the model passes the evaluation. We specify arguments for the version, such as the model's location, the deployment URI, and specify the CREATE OPERATION. Now our new model is deployed to AI platform without our intervention at any step of the process.