In this lesson, we will describe how to create a feature store, create an entity type, and add features, and also describe the ingestion process. Recall the three key challenges with ML features that come up often. Features are hard to share and reuse, reliably serving featuring production with low latency is a challenge and inadvertent skew in feature values between training and serving is common. Let's explore how creating a feature store can help address these challenges. Before creating a feature store, you'll need to preprocess your data. Ensure that your features are clean and tidy, which means that there are no missing values. Datatypes are correct, and any one-hot encoding of categorical values has already been done. There are some requirements for your source data. Vertex AI feature store can ingest data from tables in BigQuery or files in Cloud Storage. But for files in Cloud Storage, they must be in the Avro or CSV format. You must have a column for entity IDs, and the values must be of type STRING. This column contains the entity IDs that the feature values are for your source data values or your source data value types must match the value types after destination feature in the feature store, for example Boolean values must be ingested into a feature that is of type B or Boolean. All columns must have a header that is of type STRING. There are no restrictions on the names of the headers. For BigQuery tables, the column header is the column name. For Avro the column header is defined by the Avro schema that is associated with the binary data. For CSV files, the column header is the first row. If you provide a column for feature generation timestamps, use one of the following timestamp formats. For BigQuery tables, timestamps must be in the timestamp column. For Avro, timestamps must be of type long and logical time or logical type timestamp micros. For CSV files, timestamps must be in the RFC 3339 format. CSV files cannot include array datatypes, use Avro or BigQuery instead. For array types, you cannot include a null value in the array. Although you can include an empty array. After you preprocess data, you're ready to begin. You can create a feature store in the Vertex AI console or the Vertex AI workbench using the API. On the dashboard. Click features and then select the region. Click "Create Feature Store". In this field named the feature store. Keep the default region, enter the number of nodes. You can optionally use a customer managed encryption key. Then click "Create". Here is the feature store we created. A couple of notes here. One, you cannot delete a feature store from the console at this time. You must do it from the API. Similarly, if you need to add another feature store, you must add it using the API. When you select the name of the feature store, you will see the properties of the new Hello World feature store. Now that the feature store is created, you need to create an entity type. Click "Create Entity Type". When the window changes, click "Create Entity Type" again. Recall that entity types group and contain related features. for example, a movie's entity type might contain features like title and genre. Note that this feature store is Hello World. We're naming our entity type budget_ID because our dataset has media budgets for radio, television, and online newspapers. Here, entity type description is optional as is feature monitoring. Click "Create" to create the entity type. After the entity type is created, the entity is presented in the Feature Store Window. The Hello World feature store is also presented. Select entity type budget_ID, which takes you to the Hello World entity properties. Basic information presented includes the name, region, feature store name, entity created, an updated dates, and any description. Here, we added a description to this entity type simply called budget. Step 3 is to add features. Recall that a feature is a measurable attribute of an entity type. After you add features in your entity type, you can then associate your features with value stored in BigQuery or Cloud Storage. The Add Feature Window displays input fields for a feature name, value type, description, override monitoring values, feature monitoring and interval. Of the six input fields, only three are required. Feature name, value type, and interval. Before adding features, let's look at XYZ's teams dataset. Recall this source requirements for feature store. You must have a column for entity IDs and the values must be of type STRING. Team XYZ made sure to add an entity type field on the budget table. This column contains the entity IDs that the feature values are for. They've also ensured that the datatype is a STRING. Note that the five edit features map to the five fields in the dataset. Selecting the budget_ID feature shows the feature properties. Note that the datatype is STRING. To confirm that the feature set contains four integers and one STRING, select "Entity Type and Feature Value Distribution". To add feature monitoring to any entity type, click "Edit Info" while you are in the entity Window. Click "Enabled" and then update. After updating, Check the entity properties again. Note that feature monitoring is enabled with a time interval of one day. Note that monitoring can be enabled at anytime. Feature owners, such as data scientists, might monitor feature values to detect data drift over time. In Feature store, you can monitor and set alerts on feature stores and features. For example, the XYZ operations team is monitoring a Feature Store to track its CPU utilization. Note the metrics provide both the date and time at any point along the timeline. The team also set up an ingestion job. Ingestion jobs import feature data from BigQuery or Cloud Storage so it can be used in a feature store. Before you import data, you need to define the corresponding entity type and features. Feature store offers batch ingestion so that you can do a bulk ingestion of values into a feature store. For example, your computed source data might live in locations such as BigQuery or Cloud Storage. You can then ingest data from those sources into a feature store so that feature values can be served in a uniform format from the central feature store. Selecting the ingestion job takes you to the ingestion properties which identify when the job was created, how long it took to process, the region, the number of workers, and a link to the data source. The properties also identify the entity type and the name of the feature store. Note the number of ingested entities of 1,200, which barely meets the minimum number of 1,000 rows required for a dataset to be uploaded into Vertex AI. As a reminder, data for ingestion should have the following columns. Entity ID, the ID of the ingested entity, timestamp, the timestamp at which the feature was generated or computed and feature columns that match the destination feature name.