So what a Mosaic plot is going to do is we want to know

what percentage of this population was female,

and what percentage was male.

And so let's say that of the total population on the Titanic,

700 were male, and 300 were female.

So what we do on this gender axis,

and we get to pick how we want to split this.

So if I say my first split is going to be on gender,

I take this axis,

and I split it up into zero to one.

So, some percentages of this axis represents female,

since we had 300 females out of a thousand.

This is point three.

And then the rest of the area,

the rest of the length of this line,

is my other category male.

And I could have had multiple categories here.

Instead of male and female,

we could have done this with a different data set for countries.

We could have had like North American countries, South American, Asian,

whatever, and we would split this up into more breaks.

And now, this box in orangish-yellow is the percent of females.

This box is the percent of males.

And I split this with my vertical line.

My next split is on how many people survived.

And so what I can do is I can count for females,

how many survivors we had.

So I add up my survivors.

I add up my non-survivors.

So let's say, for females,

we had 300 females,

and let's say we had 200 survive,

and 100 not survive.

So, that means, again, I go from zero to one on my axis,

and I've got a total of 300.

So 100 out of 300 were not survivors according to our made up numbers here.

So, that's where this line comes in.

This goes from zero to one for my split.

So I had more people survived,

and that's what's this category.

One means survived, zero means non-survived.

And for male, I do the same thing except now this is the zero category.

This larger chunk of males didn't survive, then survived.

So I go back to my data set,

I add up how many males survived,

and how many males didn't survive.

I did vertical.

Then I did horizontal. I can also now add other chunks like class.

So, for example, I can split this by what class the boats were in.

So this is how many people were in third class,

second class, first class.

Then I can look at survivors versus non-survivors.

And we can see that the number of people that survived was smaller in each class.

So, first class had the most survivors.

But I can also then split this again by gender.

So now, I've got first class,

second class, third class.

I've got male, female,

and survive, and not survive.

So, for female, my zero and one are here.

So, the blue is not survive,

the yellow is survive.

So, in first class, almost all the females survived.

In second class, still the majority survived.

And in third class, still some survived, but less.

With male, I've got zero to one.

So I can see in first class,

many did not survive.

In second class, again,

many did not survive.

And in third class, the majority did not survive.

And so, I can again do these splits by just rotating horizontal-vertical,

horizontal-vertical, counting these up,

and putting these different percentages in my boxes.

Now, the question you should be asking is, well,

remember our lecture about which visual variables are most salient to people,

and Bill Cliven would have said, "Well,

you're comparing area between these different boxes."

Area is not a very good visual variable.

So, do people really use these mosaic plots?

Well, it's tempting to dismiss these mosaic plots because they

represent counts of categories as rectangular areas.

They're providing a distorted perceptual encoding.

But the important thing to realize is really encoding the length.

So, remember I'm splitting across the length here,

and I'm always splitting along the length.

So at each stage, the comparison of interest is the length of

the side of a different box, not really the area.

The problem, as you can see,

is as we have more and more splits,

this gets harder and harder to read,

harder and harder to label,

and sometimes we can get some long and skinny boxes.

And you can imagine that sometimes it may be that a count is so small,

I can't even see it, or what would happen if a count was zero?

How do I represent in this type of mosaic plot?

So, while this is a really interesting plot to look at,

we're actually going to see there's a similar variation of this.

There is a Treemap as well.

Later on, we'll talk about hierarchical data.

So this is yet another tool we can put in our belt for looking

at how we can count categorical variables,

how we can represent those,

and allow people to explore,

and interact with that sort of data as well.