Here are some of the advanced feature engineering preprocessing functions in BigQuery ML. ML.Feature_CROSS(STRUCT(features)), does a feature cross of all the combinations. The transform clause, which allows you to specify all preprocessing during model creation. The preprocessing is automatically applied during the prediction and evaluation phases of machine learning. And ML.BUCKETIZE(f.split_points), where split points is an array. Feature crosses are about memorization. Memorization is the opposite of generalization, which is what machine learning aims to do. So, should you do this? In a real world ML system, there is place for both. Memorization works when you have so much data that for any single grid cell within your input space, the distribution of data is statistically significant. When that is the case, you can memorize, you are essentially just learning the mean for every grid cell. Deep learning also needs a lot of data, whether you want to feature cross or you want to use many layers, you need a lot of data. If you're familiar with traditional machine learning, you may not have heard much about feature crosses, because they memorize and only work on large datasets. You will find feature crosses extremely useful in real world datasets. Larger data, allows you to make your box is smaller and you can memorize more finely. Feature crosses are a powerful feature preprocessing technique on large data sets. Our ML lab model would be greatly improved if, instead of treating the hour of day and day of week as independent inputs, we essentially concatenated them to create a feature cross. Here's an example, for any particular row of your input data set, how many nodes and X3 are lit up? Just one, do you see why? Every label, every observation of the table is taken at a specific time, that corresponds to a specific hour on a specific day of the week. So, 3 PM in the hour of day input and Wednesday in the day of week input, feature across these, and what do you have? You have one input node, the input node that corresponds to 3 PM on Wednesday will be one, all other input nodes for X3 will be zero. The input therefore will consist of 167 zeros and 1 one. That is the definition of sparsity. A feature with mostly missing values, in our case, zero, when you do a feature across, the input is very, very sparse. TensorFlow will give us easy tools to deal with this. Note, some observations about sparsity, sparse models contain fewer features and therefore are easier to train on limited data. Fewer features, also means less chance of overfitting, fewer features also means it is easier to explain to users, because only the most meaningful features remain. This is what a sparse matrix looks like, very, very wide with lots of ,lots of features. You want to use linear models to minimize the number of free parameters, and if the columns are independent, linear models may suffice. In this lab, you'll see examples of both spatial and temporal functions used in preprocessing. Note that the geography or spatial functions operate or generate BigQuery geography values. The signature of any geography function starts with ST_. In this example, ST_Distance returns the shortest distance in meters, between two non-empty geographies, taxi fare pickup longitude and latitude. For example, if you go to the BigQuery console, you would see that SQL statements from the previous slide executed here. On the left, is the schema showing the new features euclidean, day_hr, and day_hr.dayofweek_hourofday. And on the right side is the JSON file showing the new features, again,euclidean, day_hr, and day_hr.dayofweek_hourofday. Note that BigQuery ML, by default, assumes that numbers are numeric features and strings are categorical features. We need to convert both the day of week and hour of day features to strings, because the model, neural network, will automatically treat any integer as a numerical value rather than a categorical value. Thus, if not cast a string, the day of week feature will be interpreted as numerical values, for example, 1,2,3,4,5,6,7. An hour of day will also be interpreted as numeric values, for example, the day begins at midnight or 01:00 the last minute of the day begins at 23:59 and ends at 24:00. As such, there is no way to distinguish the feature cross of hour of day and day of week, numerically. Casting day of week and hour of day as strings, ensures that each element will be treated like a label and will have its own associated coefficient.