finally, let's cover AutoML Tables for tabular data. Tabular data is what you might find in a spreadsheet, for example, while auto ml vision and natural language are for unstructured data, AutoML Table is for structured data. The development of AutoML Table was a collaboration between Google cloud and the Google brain team. While the technical details of the project haven't been released to the public, the team basically took the architecture for the search capability used in image classification and translation problems and found a way to apply it to tabular data. Let's describe a data set where AutoML Table perform really well. Mercari's price suggestion challenge. Mercari is Japan's biggest community powered shopping app and marketplace. Mercari created a price suggestion challenge for predicting the price of a product offered on their marketplace so that they could give price suggestions to their sellers. Participants were given some 1.5 million rows of rich data with plenty of noise. The challenge lasted for three months and culminated in a $100,000 prize, over 2,000 data scientists competed for the prize. This plot shows the performance of AutoML Tables on the Mercari challenge for several different training times. You can see that after 24 hours of training, AutoML Tables pretty much puts you on the leaderboard. Even after only one hour of training, you get to the plateau of leaders, which is extremely impressive performance on a million plus road dataset with significant complexity. Compared to the $100,000 prize for this challenge, one hour of training is just $19. Since the search process for AutoML Tables is random, you might get slightly different results if you try to reproduce this performance. The easiest way to import your data into AutoML Tables is through BigQuery. You can also import data using CSV files stored locally or on cloud storage. One of the advantages of importing data through BigQuery is its support for arrays and strucks. Regardless for both import sources, your data must have between 1,100 million rows between two and 1000 columns and be 100 GB or less in size. Once your data is imported, the next step is to select the features you want to use and to specify the column you're trying to predict. In the next step of building an AutoML Table model, you go through a data validation phase. The purpose of this step is to ensure you're not passing bad data to your model. This includes checking for columns that have too many null values. Outlier columns that are skewing the distribution of a column, and columns that are not correlated to the target you're trying to predict. As you saw in the slide on AutoML Tables performance on the Mercari challenge, you can train a model for a variable amount of time. You can set a training budget in node hours to cap costs. By default, AutoML Tables will stop training If the model isn't seeing significant performance gains anymore. Once your model is trained, you should look at the training metrics. Be wary of models that are too good to be true. In this case, you likely have a data issue you'll need to resolve. For classification, the report includes metrics such as area under the curve for the precision recall, curve accuracy, and the F1 score. Also, a confusion matrix is output along with feature importances. These two sets of metrics are particularly useful in diagnosing low performing models. For regression models, the root mean squared error, mean absolute percentage error, and feature importances are returned among other metrics. Check the AutoML Table documentation for a full list of metrics that get generated after model training. It's arguably more important to look at the performance metrics generated on the test set to get a feel for how well your model will generalize. The same metrics generated for the training data are available for the test data. For classification models, it may be useful to set the score threshold to a value other than the default of 0.5. Increase the score threshold to make your classifier output a positive label with more confidence. Once you're happy with your model performance, you can go ahead and deploy it. You have the option of making batch or online predictions. For online predictions, you can make calls using a curl command or by one of the Java, Node.Js, or Python APIs. The same APIs are available for batch predictions, you can make batch predictions on either BigQuery tables or CSV files. However, the BigQuery data source tables must be no larger than 100 gigabytes. For CSV files, each data source file can be no larger than 10 gigabytes and if you include multiple files, the sum of all files cannot exceed 100 gigabytes. So to close out this module, let's return to the question of when should you use BigQuery ML or AutoML versus building a custom model. The short answer is it depends on how much time you have to build the model and what resources you have available. This table may provide some guidance. Given the low barrier of building a model in either BigQuery ML or AutoML, give either a try first. If the resulting model is not sufficient, only then should you throw more resources at the problem you are seeking to solve.