Now, we don't pull in visitor ID but visitor- visitor ID,
not visit ID, but visitor ID has been deprecated. Let's see.
Those are the two that we're going to drop there.
Now, let's take a look at our recipe.
So, in the lower right-hand corner,
even without you knowing,
when you started to drop these columns,
it's creating this logic with the script that's
going to be part of this pipeline that we're going to build.
If you hover over the plus here,
you can add a new step but I just want to view the recipe.
If I click on Recipe.
You remember the product SKU that had the red quality bar?
It automatically saved.
Hey, make sure every time you convert the product SKU to string,
delete these two different columns.
You're already building a recipe together, and you don't even know it.
So, let's start adding in a little bit more sophistication here.
So, add a new step,
and there is a transformation.
This is a box by the way.
The transformation, you can start typing in anything if you imagine.
We're going to be doing concatenations, case statements,
pretty much anything that you think you can do in SQL even pretty advanced functions.
We'll going to be covering a- spoiler alert.
We're going to be covering un-nesting and arrays in
most of our advanced course, in the specialization.
But for now, we will settle for deduplication of all of the different rows.
We're going to add this in, and the great part is if you can see any duplicative rows,
we'll try to hunt some down.
It will highlight them for those that are actually affected.
So, we're going to remove completely duplicative rows from the dataset.
Also, the goal here is to filter for
an apparel manager who's only interested in apparel
and those items that actually have revenue associated with them.
So, let's look for a transaction revenue.
For here, one of the great things is
instead of doing new stuff and trying to find out the formula yourself,
there is a suggested mode where you can actually click on all the missing values.
If you click on that, it prompts you with a couple of different suggestions.
Let's say I like to use all the time,
and it says, you have a couple different options.
You clicked on this missing value and what you can do is a couple of different things.
Commonly, folks wanted to delete the rows which
is what we're going to do or you can just keep
those rows or you can create a new column and flag those if you want.
We actually want to filter this out,
so we're going to delete those rows where there is
no transaction, total transaction revenue.
So that it reduce our dataset from about 13,000 or 12,000 down to 3,000 so far.
So, watch down here.
We dropped two columns down to 30,
and now we want to keep
filtering and winnowing down this dataset to only the fields that we want.
So, let's take a look at "Type".
We only want to filter for type of page, ignoring the events.
We're not double counting any items there.
Let's take a look. I'm looking for the apparel product.
I've got the productSKU, ProductName.
Here is the ProductCategory.
Apparel is number one for me here.
I'm just going to go and click on that,
and filter only the rows that contains apparel.
As you're building this recipe,
the great thing about this sample that is loaded in,
you can see that we're down to 676 rows right here,
is it's filtering these live for you as well.
So you can see, a lot of the apparel products,
the product Variant is now no longer,
not set as dominant.
So, each of these different histograms updates as well.
So, it's a great way to explore your data,
even if you don't want to ultimately build a dataset like this.
You can still explore what are the most popular variants for products.
So, small, large, medium, XL,
2XL, six medium or 10 to 12.
So, a lot of different product variants for.
You can imagine pie for clothing.
Okay. So let's see.
We've got 676 rows.
That is good for us.
The last couple of things we need to do is create two new columns.
So, I'm really annoyed by the visitor ID and my lack of
a good field for the combination of full visitor ID in these sessions.
I'm going to create a new step. We're going
to merge these two together, concatenate them.
If you type in concat or merge,
it'll take you to the same place.
We're going to merge those two columns together.
Full visitor ID, start typing and it will autocomplete for you.
The other column we want is visitID.
You can see automatically as you're starting to type this in,
and it'll say, "hey,
I'm going to concatenate these two together,
do you want to have a delimiter for- I think it's single quotes.
Do you want to have a hyphen in between them or star or something like that?
You can specify those.
I'm going to keep it as just completely jammed together.
We'll call this unique_session_ID.
Great. So, we got a good field for that.
That'll be good for reporting later on when we're doing group bys and we don't
need to do the concatenation in SQL anymore.
So we've got that unique identifier.
Now, one of my,
you'll begin to start memorizing all of the different e-commerce action types here,
and you'll notice- here's a uninteresting quiz question for you.
Why do we only have e-commerce action types of five or six?
If you remember, just because you memorized it like me,
five or six corresponds to the check-outs.
Completed, purchase, I think it's number six.
Somewhere along the checkout process is number five.
But if you don't have those memorized, that's fine.
We're actually going to be creating a case statement here.
But to answer that quiz question, the only reason why you have those
two is because we proactively filter down
for only those who have a transaction value present.
That's why if it has been part of checkout,
you're not going to have any transaction associated with it.
So here, we're going to create a new step.
Now, if you start typing in case or if then or conditions that'll get you here,
and much like we did the case statement in one of
the challenge questions in the previous lab,
we are going to be building a case statement effectively through the UI here.
So, we don't want a single condition for an if/then/ else,
we actually want to specify a case statement
in the column that we want to specify that on,
its e-commerceAction_type and we have quite a few cases.
So, we have one case here.
I'm going to, it's nine in total including zero.
So, zero through eight.
Again, as a handy reference,
I'm going to flip over to the BigQuery schema,
and I'm going to find action type.
These are the nine that you'd be mapping inside of there.
So, let us start the mapping.
So, I'm going to add 1,2,3,4,5,6,7,8,9,
and it's exactly what you think of anything to do.
Case one is zero.
Case one, it's one. Case one, it's two.
You can do these.
I like putting in all the numbers first, all the different cases.
But essentially, we're going to map those to those string values.
So, we have that lookup table in there.
You won't have to memorize those.
The default value, if it's none of those,
we're going to have that just return a null value,
and when it give this the creative column name of eCommerceAction_label.