0:05

So, this is the same setup and

Â the question is how many people, this is in a much bigger group.

Â How many people would you have to ask before you found every day of the year?

Â Well, you have to ask at least 365 and

Â if you are lucky and everybody had a different birthday you'd be fine.

Â But generally you'd have to go much more.

Â So that again corresponds to throwing balls into urns.

Â And again randomly.

Â And then the question is how many do you have to throw in before every urn has at

Â least one ball in it.

Â So this one here.

Â Nine finally finds it's way in.

Â 0:53

So generally, it'd be much more than the number of urns.

Â The question is, how long til each urn has at least one ball in it.

Â This comes from, I don't know what the rage is lately, say baseball cards or

Â magic cards or Charlie and the Chocolate Factory in the cards.

Â How many of the different ones you have to collect before you get

Â every possible coupon.

Â That's the same problem, in different types and

Â when you get one it's random how many you have to get before you get them all.

Â 1:29

And there's actually plenty of practical applications where we want to know

Â the answer to this problem as well.

Â I have a 20-sided die, this is, again, just another illustration.

Â You keep rolling the die, and every time you get a new value,

Â you check off the value.

Â So how many times, you're going to have to roll the die,

Â so because of the birthday problem, after not too many rolls, you get a repeat.

Â So not new, and then after a while you start to get into plenty of repeats.

Â But the question is how many times do you have to roll it untill you get all

Â possible values?

Â So in this case now we're still looking for just two more possibilities.

Â Finally we get the 4, and

Â then after 37 rolls we have all possibilities For a 20 sided die.

Â So in general, if given M,

Â how many times you have to throw to get all different M values.

Â That's the coupon collector problem.

Â Now, with classical probability, this problem is actually not too difficult.

Â And so I'll give a classical analysis.

Â Actually, the analytic combinatorics is more complicated than this.

Â But it's still worth doing because the classical analysis just gives the average.

Â And if you want to study variants of a problem, or

Â other properties of the distribution,

Â you're going to need the structure that we get from analytic combinatorics.

Â But anyway, here's the classical analysis.

Â So, first thing is the probability that you need more than j

Â rolls to get the k+1st coupon is k/M to the j.

Â That's the probability that the first j rolls are all from the same k values.

Â 3:21

Again, adding those up in the same way,

Â the expected number of rolls to get the (k+1)st coupon is just solving that,

Â which is 1/(1-k/M) which is the same as M/(M-k).

Â So that's for the (k + 1)st coupon,

Â that's the expected number of rolls to get the (k + 1)st coupon, and so

Â the expected number of rolls to get all the coupons is just summing that

Â because that's an expectation, and we can sum expectations.

Â So sum of that, of M/M- K, K from 0 to M, that's just M times the harmonic numbers.

Â 4:41

Although, really, the motivation for

Â this is much more interesting when we look at variations later on.

Â So start out the same way as we did for the birthday problem.

Â Coupon collector sequence is an N word that no empty set.

Â Or it's a string that uses all the letters in the alphabet.

Â So how many coupon collector sequences are there?

Â 5:05

So, for example, for m equals 26, that's one.

Â Uses all the letters of the alphabet.

Â Well known one.

Â Then, after m equals five.

Â There's an example.

Â So, no empty sets So, again, we set it up the same way, as we did before.

Â 5:26

With using exponential generating functions treated as a label set.

Â So what is a coupon collector sequence?

Â It's a, we have M different letters, so

Â it's a sequence of M different letters And now our sets have to be non-empty,

Â so there just has to be a set size greater than zero.

Â And that immediately gives us a EGF equation that,

Â either it's the C-1 to the m.

Â 5:59

So, extracting coefficients, getting the coefficient of

Â n factorial comes the coefficient of z to the n and we get this complicated

Â equation, for the counting sequence, for the number of coupon collector sequences.

Â Out this is an so helpful but

Â 6:21

if you go ahead and extract the coefficient to Z to the end getting

Â alternating some and so not necessarily so helpful.

Â I mean itâ€™s asymptotic to be It's like m to the n, and

Â then a smaller number to the n so subatomically it's m to the n.

Â But all that means is if you have a long string almost all,

Â if you have a random string of n objects where n's bigger than m.

Â Almost all of them are going to use all the letters if N is

Â significantly bigger than M.

Â So that's not helpful.

Â What we need to do is exactly evaluate this.

Â 7:08

So the probability, the random

Â M-word length N is a coupon collector sequence is that, what I just showed.

Â And this is a little bit complicated to

Â evaluate because of the alternating sum feature of it.

Â 7:50

And while if you have studied Kanuth eventually you can get the solution MHM,

Â but it really is still kind of a magic solution.

Â So this is possible to get to the end, but

Â this is not a recommended way to get to the end.

Â For this problem.

Â 8:09

When we look later on at generalizations and asymptotic

Â methods in the second part of the course, we'll kind of see why this happens.

Â But for the elementary derivation this one doesn't work.

Â Now there is a way to do it with OGFs

Â that pretty much mirrors the classical derivation.

Â So if we define WMk, it'd be the class of M words that have k different letters.

Â With the last letter appearing only once.

Â 8:50

So then you have an OGF for those, and

Â then we can actually make a probability

Â generating function just by evaluating that at Z over N.

Â And so then if you differentiate that

Â probability generated function evaluated C equals 1 you get the mean weight time.

Â For k coupons.

Â So that's pretty simple generating function on manipulations.

Â And so, what we'll do on the next slide is

Â use a combinatorial construction to get an equation for the generating function and

Â then perform this operation to give us a solution.

Â So here's the construction.

Â So if we have m words with k different letters.

Â The last letter appearing only once.

Â 9:57

And then that construction immediately translates

Â to this generating function equation, which I won't read out.

Â And then evaluate at z over M to get the probability generating function.

Â Differentiate and evaluate at 1, then we get a recurrence relation for

Â the thing that we're looking at, the wait time for k coupons, and

Â that leads us to the same sum that we got in the elementary derivation,

Â which is going to lead to, again, the end result.

Â 10:42

And our answer in practice is, if you

Â know how many different baseball cards there are, multiplied by the log of that.

Â Now that's how many you're going to have to look for to get every possible coupon.

Â So for 20 sided die it's about 60 or

Â on application Is maybe testing pages in memory,

Â you do it by having a program randomly access the pages,

Â 11:12

maybe that you want to be sure that the test hits every page.

Â Then you can figure out, you can take, if M is a million,

Â then it's going to take you about a million times natural logarithm,

Â which is about 14.5 million Steps to test every page.

Â That's an example of an application in the real world.

Â It's a well known phenomenon that has lots of applications.

Â And that's the combinatorics of the coupon collector problem.

Â There is a combinatorial concept called a surjection that does

Â really need analytic combinatorics to study.

Â 11:50

So what we call a coupon collector sequence is,

Â it's an M-word with no empty set.

Â So that's called an M-surjection.

Â But there's a more general thing which is just a word that is there is

Â an M-surjection for some M and that appears in lots of applications.

Â So that is either it uses just one letter or

Â uses two letters, but without leaving any out.

Â Where it uses three letters without leaving any out.

Â And this is a combinatory logic that's a well defined and

Â easily handled with the symbolic method.

Â So this is what we did for M-surjections.

Â It's a sequence of non-empty sets, e to the Z minus 1 to the M.

Â And we hit for that.

Â For surjections, it's not a sequence of size M, it's a sequence of any length.

Â 12:56

set of, our empty set is this z-1.

Â Sequence is 1 over -1 so it's 1 over 2 e to the z.

Â And so how many subjection are there linked in.

Â We'll see that this is best handled with complex asymptotics but it's N.

Â Over 2(log 2) to the N+1.

Â That's a classical example in combinatorics.

Â Related to the coupon collector problem.

Â And we'll see this coming up in more detailed studies in part two.

Â So that's the coupon collector problem.

Â And next we'll go onto applications in computer science.

Â