Data repositories in which cases are related to subcases are identified as hierarchical. This course covers the representation schemes of hierarchies and algorithms that enable analysis of hierarchical data, as well as provides opportunities to apply several methods of analysis.

Associate Professor at Arizona State University in the School of Computing, Informatics & Decision Systems Engineering and Director of the Center for Accelerating Operational Efficiency School of Computing, Informatics & Decision Systems Engineering

K. Selcuk Candan

Professor of Computer Science and Engineering Director of ASU’s Center for Assured and Scalable Data Engineering (CASCADE)

In previous modules we talked about hierarchical data structures.

Different places where you might find hierarchical data,

whether it's through your file structure, through genomes, through family trees.

And then we talked about, sort of, no-link diagrams for

visualizing hierarchical data.

As well as methods for taking any, sort of,

generic data set and doing a hierarchical clustering on it.

In this module we want to talk about another way to represent

hierarchical data through the use of what we call a treemap.

So previously we talked about node-link diagrams.

And some of the problems with the node-link diagrams were,

essentially, all of the, sort of, white space that might be left over in an image.

Especially if the tree gets really long one direction we wind up with all of this,

sort of, whitespace left on the screen.

So people started thinking about what might be a space-filling representation.

And so a space-filling representation, one idea was to create an icicle plot and

so an icicle plot is shown here on the right.

And how do we interpret this icicle plot.

Well, each level is a level in our node-link diagram so

this top level has just one bar so that's our root node.

So let's see if we can [COUGH] understand how this looks.

So our root node here.

And then on the next level we have three nodes one, two, three.

So that's how we have three splits here so one, two, three.

And then the next split is one, two, three here so we have one, two, three.

Again, under here we have one, two, three, one, two, three and here we have one, two.

And now,

with an icicle plot you might notice some of these bars are longer than the others.

So in icicle plot we might encode different information to the hierarchy

with respect to length, or width, or those sorts of things.

Now, of course,

challenges might be depending on what I'm encoding the data with.

It could be that one of these becomes really, really small,

it could be that one part of the tree goes deeper than the others.

So I still wind up with [COUGH] elements that aren't there and

being filled in and so forth.

So people have been trying to come up with different ideas for

space-filling representation.

And the most popular became the treemap and this a space filling

representation developed by Johnson and Schneiderman.

Where children are drawn inside of their parent, essentially they alternate

horizontal and vertical slicing at each successive level of the tree.

Similar to, sort of,

how we thought about those mosaic plots in our statistical graphics.

But here we use area to encode other variables of data items,

we might use color, we might be able to add text.

So let's take a look at our picture here.

This is [COUGH] a data wheel via Wikipedia, Wikimedia Commons so

lets think about how we can interpret this.

So first, we have to try to find our splits so remember,

it alternated between horizontal and vertical slices.

So the only place I see a split here with a vertical slice is right here,

it splits our data into two sides.

So we've got our root node and we've got our two sides.

Okay, so let's just start with this side first and

let's call this side A so this is A.

So now, we had a vertical slice now, we want to look for

horizontal slices that go all the way across.

So we've got that one.

So this also has two nodes and this splits between cashier and

miscellaneous managers.

So we've got cashier and we've got miscellaneous managers.

Let's go ahead and look at this side too, as well.

This side had one horizontal split because, remember,

we're alternating between horizontal and vertical.

So we get, again, two nodes.

And one node is this, I got to erase this so I can read this, is the driver sales.

So sales and this other one doesn't have a label yet.

Okay, so let's continue down that tree a little bit so

let me get some space here on my screen.

To draw this a little bit better.

Okay, so interpretation of a treemap.

So, first, we have to decide if the first slice was horizontal and vertical.

So I only see one vertical slice slicing at my whole tree so that's my first cut.

So my tree gets split into two elements,

this is my blue green side and this is my yellow purple side.

Okay, so on the blue green side, since we did vertical first, now we do horizontal.

And so this has two nodes where this is the miscellaneous and the cashiers.

On the yellow purple side our vertical or

horizontal slice goes there so we wind up with the drivers as a node.

But notice this node has two elements, but

the next vertical slice splits it there so we can go here like this.

So we've got cooks and we've got construction.

And we can continue going back and forth between these vertical and

horizontal flips to get the whole hierarchical structure from the treemap.

And now, really though,

we want to be able to take a structure like this and go to this.

And you want to be able to read a structure like this in this sort of

manner, all right.

So we want an algorithm to be able to draw this and the reason we talked

about interpretation is because you want to be able to read the treemap.

So with the treemap algorithm what we're going to do is first, take the root.

And we're going to decide if our first cut is going to be horizontal or

vertical, then we take our square and we split.

So let's say we have a tree that looks.

Like this.

So here's our screen space so let's make our treemap.

So we start with the parent and

let's say we decide that our first cut is going to be horizontal.

Okay, so we have three children so one, two, three, okay.

So now for this level we're going to do vertical cuts so

first we look at this guy so that's this box right here.

So now we're going to do vertical cuts, it has two children so

we're going to split that into two.

Okay, this one had no children so no cuts are necessary.

And this one had two children so, again, we split this into two.

And, now,

where we put this split may depend on different properties of the nodes.

For now we're not going to worry about that,

we're just walking through an example.

Okay, so now we split it and we flip, again.

We go to horizontal so we split this into two nodes so

we've got our node here and our node here.

So this node had two children, this node had zero children so we have our split.

This one didn't have any depth, this one didn't have any further children.

And so we wind up with our treemap that looks like this for this data source.

And as all we're doing is just flipping back and forth between horizontal and

vertical cuts.

Now, we can also add some embellishments to our treemap,

we can do a nested versus a non-nested tree-map.

So, for example, here is a traditional tree-map.

And, remember, the size corresponds to different properties of elements.

And nesting is trying to add some sort of border here to give some embellishments

to the visualization to make things pop out a little bit easier.

And you can see the difference between the non-nested tree-maps on the left and

the nested tree-maps on the right.

And we can use the tree-map idea in a variety of domains.

So people have used this in file directory structures, sports analysis,

stock market data, software diagrams.

And you might think to yourself, well, it makes sense for me for file directories,

but what about sports analysis.

Well, again, with sports you've got your game.

Inside of a game you might have, so let's think about football for example, so