First, let's start off by making the case for why you may want to use AutoML. Where does AutoML sit in the suite of Google Cloud products that can be used for machine-learning? On one hand, with products such as Vertex AI and BigQuery ML, you can build very customized machine-learning models. However, to use these products, you'll need someone with machine-learning expertise and coding experience. You'll be responsible for training the machine learning model yourself, which can take a lot of time. On the other hand, on Google Cloud, you can call pre-trained models using services like the Vision API and the Speech-to-Text API. No model training is required for these services, you simply feed the API your data, and it returns predictions. The downside of using pre-trained models is that they only yield good predictions when your data is relatively common place, as in social media, images or customer reviews. AutoML sits somewhere in between these two. A model is trained specific to your data, but you don't need any code to train us. If you're going to train a machine learning model from scratch, you need machine-learning and coding expertise. Anecdotally, building machine learning models follows the Pareto principle, where you can launch your functional machine-learning model relatively quickly. However, most time and effort will go into debugging and making the machine-learning model performant. AutoML follows a standard procedure that is divided into three phases, which are train, deploy, and serve. The training phase has several steps. First, you have to prepare a dataset that will be used in the supervised training process. Next, you need to analyze the dataset to make sure it has qualities that will enable it to be effective, and you may need to correct the dataset. After the dataset is prepared and validated, you use it to train the model. Finally, the model is used with test data to evaluate whether it is going to be effective in predicting and classifying new cases. If the model doesn't work well at this point, you may have to go back and modify the dataset and try again. The second phase is to deploy the model and managers. That means getting rid of old or unused models. The third phase is hosting the model on a surface where it can be used to predict and classify. In traditional machine-learning, the deploy and serve phases are complicated and involve moving the model from a model building system like TensorFlow to a model hosting system like Vertex AI. However, AutoML handles most of the complexity of these activities for you, making these activities easy. AutoML uses a prepared data set to train a custom model. You can make small prepared data sets for experimentation directly in the web UI. But it is more common to assemble the information in a CSV comma separated value file. The CSV file must be UTF-8 encoded and located in the same Cloud Storage bucket with the source files. You can also create and manage prepared data sets programmatically in Python, Java, or Node.js, the first column in the CSV file is optional. It assigns the data in each row into one of the three groups, train, validation, or test. If you leave out this column, the rows will automatically be assigned with 80 percent going to train and 10 percent each to validation and test. The next column in the CSV file identifies source files that are hosted in Cloud storage. These are parts beginning with GS colon forward slash forward slash. The source file format depends on the model you are training, but can also be compressed ZIP files. Subsequent columns specify labels. The labels are alphanumeric and can contain underscores but not special characters. The CSV file should not contain duplicate lines and may not contain blank lines or Unicode characters. Currently, the CSV file and all the source files must be in a Cloud Storage bucket in the project where AutoML runs. Prepared datasets do not expire. You may accumulate many prepared datasets in a project. You can list and delete those you don't need. AutoML performs basic checks on the preliminary analysis of the prepared dataset to determine if there is enough information and if it is properly organized. If the prepared dataset is not ready, you will need to add more rows or more labels to the CSV file. When it is ready, you can start training. Training can take from 10 minutes to several hours depending on the model. You can check the status while it is running. Import and training tasks can be canceled. The trained group of data is used to train the custom model. The source files have already been associated with the correct labels in the prepared dataset. AutoML uses a supervised learning method to train the custom model. Part of the process uses the validation group data to verify how well the model works at classifying and predicting. Supervised learning works on correctable errors. AutoML constructs an algorithm that guesses the labels for source data. When the guess is right, it strengthens the algorithm. When the guess is wrong, the error is used to correct the algorithm, and this is how learning occurs. One full run through all the train grouped data is called an epoch. Total error is tracked and minimized through multiple epochs to create the best model possible from the training data provided. The result is a trained custom model. The custom model works well with the training data, but is it good at categorizing new instances of data it has not seen before? Data from the test group is used to evaluate the custom model and to remove bias from the evaluation. Their predictions and classifications are compared with the labels in the prepared dataset. The evaluation report provides indicators that are specific to the kind of model and help understand how effective the model is at predicting and classifying. There is nothing you need to do to activate a model. However, if it has been some time since you used a model, the system may need to warm up for a few minutes before the model becomes active. Once it exists, if you have the project credentials and model name, you can access and use the custom model. Each time you train with a prepared dataset, it creates a new custom model. You can list and delete unneeded models. Custom models are temporary, they are eventually deleted and they cannot be exported or saved externally. Models that are not used for prediction are automatically deleted after a period. Models that are used are eventually deleted. You will need to train a new custom model periodically to continue predicting and classifying. How long models remain before they are deleted depends on the model type. The primary classification interface is at the URI shown. You can make a classification using the Web UI or from the command line using CURL to send a JSON structured request. There are also client libraries for Python, Java, and Node.js. After you have set up authentication to use the REST API, you can send a request with the model name and the payload, which is the data you want classified. The service returns JSON containing multiple fields called display name. These are the labels that matched. Then it contains the keyword classification, followed by a score. The score is a confidence value where 1.0 is absolute confidence, and lower fractional numbers represent lower confidence in the correctness of the classification. Quotas apply for both model creation and service requests. AutoML lowers the effort required to create a model when compared to traditional machine learning. With traditional ML, models were hard to create, so there was a tendency to try to make the dataset on the model inclusive. With AutoML, you can create smaller, more specialized custom models and use them programmatically. You don't have to squeeze everything into one model. You can break apart a classification into multiple steps. You can use the results of one classification to make choices about what kind of classification to perform next. Here's an example. A company that sells clothing has a service office that received emails from customers. The first job might be to distinguish emails containing feedback about products from emails requesting information about the company. Model 1 could be used to classify feedback email. The second job might be to distinguish whether the email is describing pants, shirts, shoes, or hats. This might be the job of Model 2. Model 3, might be used only for emails talking about shirts to see if the style of the shirt is mentioned. Model 4 might be used only for emails about shoes to see if the shoe style is mentioned. You can see from this example that a collection of models might be able to accomplish magic in your application by focusing the scope and purpose of the models. You can also programmatically combine your custom model with a standard model, such as the Cloud Natural Language API. This concludes the discussion of AutoML. The recommended application strategy is to first use the pre-built artificial intelligence services. Next, you can use AutoML to produce custom models which can be used with the pre-built services or on their own. Remember that you can divide a problem into specialized parts and use multiple custom models together. Finally, if you discover you need more advanced features, you can use the machine learning and artificial intelligence services to create new models.