0:03

Next we're going to talk about strings and tries.

Â These are combinatorial objects that have really come under scrutiny

Â because of computational applications.

Â But they're very well suited to the analytic techniques that we've been

Â talking about.

Â Again, we're in the second half of the class, and this is lecture eight.

Â And we're going to look at these combinatorial objects as

Â unlabeled classes, mostly, so we'll be talking about OGFs.

Â And again, we'll give some examples in lecture, but

Â there are many more in chapter eight of the book.

Â 1:26

Or what's the, some of these bit strings have no 000, I think.

Â Well, maybe not.

Â So, but if you look at shorter ones, like this distance, then there'd be no 000.

Â So the question is, what's the probability that you don't

Â have 000 in an N-bit random bitstring?

Â And questions like these actually have lots of important practical applications,

Â where the zeroes and ones correspond to electrical signals, and

Â certain kinds of devices can't take long strings of offs, for example.

Â So let's review what we talked about for the symbolic method for bit strings.

Â It's one of the very simplest examples of analytic combinatorics.

Â How many binary strings are there with N bits?

Â Of course, there're 2 to the N.

Â And we can do that with the symbolic method by constructing

Â this class of all binary strings B as a sequence of zero or one bits.

Â And then our normal symbolic transfer,

Â take Z0 plus Z1 to 2Z, and sequence of anything of 1 over 1 minus that.

Â So that tells us that the OGF that enumerates the number

Â of binary strings is b(Z) = 1/1- 2Z.

Â And so the coefficient of Z to the N in that is 2 to the N.

Â That's our basic construction for binary strings using the symbolic method.

Â 3:43

That's the natural construction for this class of combinatorial objects.

Â And then that translates through the normal translation theorem to this OGF

Â equation, 1 + Z (Z + Z squared times b00(Z).

Â And then we can solve that, and we get a polynomial,

Â a ratio of two polynomials for the generating functions.

Â And that's a classic example in asymptotic analysis.

Â The coefficient of Z to the N in that function is going to be,

Â well in this case it's exactly the Fibonacci numbers.

Â But in general, if it's any polynomials,

Â you look at the largest root of the polynomial in the denominator.

Â And that gives an asymptotic expression for

Â the coefficient of B to the N in that generating function.

Â So simply by finding the appropriate root of the polynomial, we get the asymptotics.

Â And for any polynomial, it's going to be the form a constant times

Â the root to the Nth power, unless there happen to be multiple roots.

Â So in this case, it's the golden ratio,

Â phi to the Nth power, and then the constant also is explicitly determined.

Â That's example that we did in chapter five.

Â So first thing we want to do today is extend that, how many

Â binary strings have no runs of P consecutive 0s?

Â And that's going to extend in a natural way,

Â very much like the derivation that I just did.

Â So the construction then is a string with no runs of P consecutive 0s,

Â is a string of 0s of length less than P followed by either an empty string or

Â a 1 followed by a string with no runs of P consecutive 0s.

Â So that's just a generalization of the construction that I did on the last slide.

Â And again, that immediately translates a string of 0s of length less than

Â P is 1 + z plus so forth, up to z to the P- 1,

Â and then E translates to 1 and zBp(z).

Â And so solving that equation gives

Â Bp(z) equals again a ratio of two polynomials.

Â And again, the coefficient of z to the N in that is asymptotic to a constant

Â times largest root to the Nth power, where that dominant root is

Â something we can calculate, and also the coefficient is explicitly available.

Â 7:20

The OGF, SP(z), is going to be the sum of N, the number of bitstings

Â of length N with no runs of P consecutive 0s times z to the N.

Â And that, we have an explicit formula for that OGF,

Â 1- z to the P/1- 2z + z to the (P + 1).

Â Now this is similar to the situation

Â we had with permutations.

Â We can convert that into a probability by

Â evaluating an appropriate value of the argument.

Â If you look at Sp(z/2) 2,

Â then the Z to the N becomes 0/2 to the N.

Â So what that is is the probability that a bitstring of length N has no runs of P 0s.

Â So it's a PGF just by evaluating the oracle,

Â the argument at Z over 2.

Â So now if we take Z=1 then that's

Â just summing the, what is that?

Â That's the Sum on N of the number of bitstrings of length N with no runs

Â of P 0s divided by 2 to the N.

Â Which is the sum on N then the probability that

Â the first N bits have no runs of P 0s.

Â And that's the same that the sum of the probability

Â that the end of the first run of P 0s is bigger than N.

Â So the number of bitstrings of length N with no runs divided by 2 to N,

Â this is the probability that the first N bits have no runs of P 0s.

Â And that's the same as the probability that the position of the end of the first

Â run of P 0s is bigger than N.

Â But that's exactly the average position of the end of the first run of P 0s.

Â So that's the average wait time.

Â So this argument just gives us these two theorems.

Â The first one is the probability that an N bit, random bit string has no run of P 0s.

Â Is the coefficient Z to the N in SP evaluated Z over 2,

Â that's the second line here.

Â And so [COUGH] going from the solution on the previous

Â slide it's going to be our dominant root divided by 2 to the N.

Â And then the other thing is the expected wait time is just out

Â generating function evaluated at one-half.

Â If you evaluate that generating function at one-half, you'll see that all that's

Â left of 1-2Z cancels out, so it's one-half to the p + 1 in the denominator.

Â So it becomes 2 to the P+1-2.

Â So that's using the symbolic method to get information,

Â explicit version of the generating function.

Â And then evaluating that generating function to get the analysis

Â of the properties of the consecutive 0s in a random bitstring.

Â Now so just to summarize, for small values of p,

Â which are usually what's of interest.

Â So now, generating function when p =

Â 1 is 1-z over 1-2z + z squared.

Â So the approximate probability of no 0 in a random bitstring,

Â if size N is one-half to the N and then 10, it's 0.0010,

Â and 100 is 10 to the -30th, and the average wait time is 2.

Â For three 0s, then we can do the explicit [COUGH] those are the values of the roots

Â and it turns out that the probability of no run of two 0s in 10 bits is about 0.14.

Â And 100 bits its 10 to the -9th and

Â the weight time for the first run of two 0s is about 6.

Â For three 0s, probability of no run of three 0s becoming more and more likely.

Â So for 10 bits, it's about even chance if there's no run of three 0s.

Â For 100 bits, it's still fairly unlikely.

Â Four 0s now 10 bits,

Â three quarters of the time you're not going to have four 0s in a row,

Â you have to wait 30 bits to get your first run of four 0s on average.

Â 12:02

For 100 bits, it's still pretty unlikely that you won't have four 0s in a row.

Â And then for five 0s, now it's almost 90% chance that you're not going to have five

Â consecutive ze0s oes in 10 random bits.

Â 20% chance you're not going to have five consecutive 0s in

Â a 100 random bits and the wait time is 62.

Â And then for six 0s, it's almost certain and

Â there's even chances in a random string of 100 bits,

Â that you won't have six consecutive 0s, and your wait time is over 100, it's 126.

Â So those are facts that we can derive from this analysis.

Â And actually, this type of statistic is one way to

Â test whether a set of strings is random.

Â 12:54

In fact, it's always worthwhile to validate

Â these types of mathematical results.

Â And in situations like this, when it's so

Â easy to write code to validate these results we should go ahead and do it.

Â So this is a Java program that takes as

Â argument on size and number of trials.

Â And what it's going to do is,

Â this one actually reads the bits from standard ints.

Â So we separate out what the bits are from analyzing them.

Â So this will read w bits at a time from standard int.

Â 13:34

So like in that example, w was 75 and

Â there were a bunch of occurrences of 75 bits that were supposed to be random.

Â And then for each p it will check whether there's p consecutive 0s,

Â and it'll just print out the empirical probabilities.

Â Then this is just the code to just check for p consecutive 0s.

Â So this is a very straightforward program that reads in a bunch of bitstrings and

Â then prints out out of all those bitstrings.

Â What's the empirical probability that you find one,

Â two, three, four, five up to p consecutive 0s?

Â And so for the example that I used on the first

Â slide this is the data that it gives.

Â Actually those are the first line of 10,000 bitstrings of size 100,

Â and for those bitstrings there was the probabilities that [COUGH] zero,

Â one, two, three, four, five, and six 0s occurred or not.

Â And those compare very favorably with what we just saw was predicted by the theory.

Â So we can think that those bits are pass this test for randomness,

Â or we can think of this as validating the theory that we just derived.

Â Predicting for example, that you have a 45% chance of finding say,

Â six consecutive 0s in the random bitstrings.

Â So that's a fine generalization of the simple example that we did for

Â how many bitstrings that don't have two consecutive 0s.

Â 15:23

Now so check, so now the next thing that we want to

Â do is consider other specified patterns.

Â So say, it's not 000 that we're interested in but say, 001.

Â So now in blue, you can hardly see, but the wait time for 000.

Â That's the one that I did before is in red, and it's an average about 18 for

Â these, but in blue is the wait times for 001.

Â So in the first line, the first occurrence of 001 is starting at the nineth bit.

Â In the second, it's at the forth bit, and so forth.

Â And this time, the expected wait time for 001 in this set of bit strings is only 6.

Â It's much much less.

Â And now we're wondering, are these bit strings really random?

Â What's going on here?

Â Well, the fact is that the pattern, itself,

Â definitely is a factor in what the wait time is.

Â And that's what we're going to look at next.

Â 18:18

So but now we want the answer to this question,

Â what's the probability that a random bit stream doesn't contain that pattern, 0001?

Â It's a little bit counterintuitive that it's not the same for

Â any set of four bits but it's definitely not as we'll see.

Â Or what's the wait time for the first occurrence of 0001?

Â What about other patterns, like 1110 or 1111, and so forth?

Â So, the idea that it's the pattern itself

Â that determines the probability of existence and

Â the expected wait time has to do with a phenomenon called autocorrelation.

Â That's what we're going to look at next.

Â So what we'll do is use the symbolic method to

Â get explicit [COUGH] equations for the generating function for

Â the number of bit strings not containing a specified pattern.

Â 20:03

These two sets are disjointed,

Â that is the T sub p has the pattern in it, S sub p does not have the pattern in it.

Â So there's no string that's in both of them.

Â Empty string doesn't contain a pattern so its in S sub p.

Â The other thing is if we have a string in S sub p,

Â and we add one bit to it, either we're going to get a string that's in S sub p,

Â or that bit will complete the pattern and

Â we'll get a string that's in T sub p, the only occurrence of the pattern.

Â 21:24

in number of trailing bits, uses, and exponents.

Â So we start out with a z to the 0 in the polynomial.

Â And then we slide over one, two, three, four, five positions, and

Â when we slide over five positions, then the trailing bits match the leading

Â bits of the pattern, when the 1010 at the beginning match the trailing four bits.

Â And we slid five positions, so we put z to the 5th.

Â And then two more positions,

Â the two trailing bits match the two leading bits that's z to the 7.

Â So in all the possible slides the only match of

Â the trailing bits with leading bits is at position zero, five and seven.

Â That defines the autocorrelation polynomial of the pattern for this one.

Â It's all our correlation problem is one plus Z to the fifth plus Z to the seventh.

Â And that's going to play a role in the development of another

Â relationship between two combinatarial classes that we just defined.

Â This is the second construction so

Â remember S sub p doesn't contain the pattern.

Â And T sub p ends in the pattern but has no other occurrence of the pattern.

Â And so the idea is, if you take any string in T sub p and

Â then you add a tail corresponding to the tail

Â that you'd get to complete the pattern in the auto-correlation.

Â So the first one's null and the second one, if you add the five bits that you

Â shifted off at the end of the pattern, you get an occurrence of the pattern.

Â And again, if you had the seven bits, you'd get an occurrence of the pattern.

Â So in those three cases, null, one of course corresponding to each bit in

Â the auto correlation polynomial, you'll get a string,

Â that you have a string in S sub p does not contain the pattern.

Â If you add the tail, I'm sorry, if you have a string in T sub p and

Â you add this tails, then you'll get

Â a string in S sub p from knocking off the pattern.

Â So every time the result is the string in S sub p followed by the pattern.

Â So S sub p, times the pattern, is T sub p,

Â times the one place, for each bit in the correlation polynomial.

Â 23:55

So we have those two constructions.

Â One, if you add a bit to a string in S, you get either a string in S sub p,

Â And the other one is if you take a string in T and you got a chance for every bit in

Â the order correlation polynomial to make a string in S followed by the pattern.

Â Those are the two constructions.

Â And these use, these two constructions are set up for the symbolic method.

Â They just used the basic operations.

Â So they're immediately correspond to generating function equations.

Â First one, z0 + z1 is 2z, and otherwise,

Â it's just directly corresponds to having a generating functions.

Â In the second one, the pattern has OGF z to the p, so Sp times zP = Tp.

Â And then what's left over here is the correlation polynomial itself.

Â So now we have two equations in s and p that we can solve.

Â And the correlation polynomial of the pattern is part of the answer.

Â So that's the solution of those two simultaneous equations.

Â