Over the years many natural cryptographic constructions were found to be insecure.

In response, modern cryptography was developed as a rigorous science where

constructions are always accompanied by a proof of security. The language used to

describe security relies on discreet probability. In this segment and the next,

I'll give a brief overview of discreet probability, and I point to this Wiki

books article over here for a longer introduction. Discrete probability is

always defined over a universe which I'll denote by u and this universe in our case

is always going to be a finite set. In fact very commonly our universe is going

to be simply the set of all n bit strings which here is denoted by zero one to the

n. So for example the set zero one squared is the set of all two bit strings which

happens to be the string zero, zero, zero one, One, zero, and one, one. So there are

four elements in this set, and more generally in the set zero one to the N,

there are two to the N elements. Now a probability distribution over this

universe U is simply a function which I'll denote by P, and this function, what it

does, is it assigns to every element in the universe a number between zero and

one. And this number is what I'll call the weight or the probability of that

particular element in the universe. Now there's only one requirement on this

function P, and that is that the sum of all the weights, sum up to one. That is,

if I sum the probability of all elements X in the universe, what I end up with is the

number one. So let's look at a very simple example looking back to our 2-bit

universe. So 0001, ten and eleven And you can consider the following probability

distribution Which, for example, assigns to the element 00, the probability one

half. The elements 01, we assign the probability 1/8th, to ten we assign the

probability one quarter and to eleven we assign the probability 1/8th. Okay we can

see that if we sum up these numbers in fact we get one which means that this

probability P is in fact the probability distributio N. Now what these number mean

is that if I sample from this probability distribution I'll get the string 00 with

probability one half I'll get the string 01 with probability 1/8th and so on and so

forth. So now that we understand what a probability distribution is, let's look at

two classic examples of probability distributions. The first one is what's

called the uniform distribution. The uniform distribution assigns to every

element in the universe, exactly the same weight. I'm gonna use U between two bars

to denote the size of the universe U. That is the number of elements in the universe,

and since we want the sum of all the weights to sum out to one, and we want all

these weights to be equal, what this means is that for every element X in the

universe, we assign a probability of one over U. So in particular if we look at our

example, the uniform distribution and the set of two [inaudible] strings, would

simply assign one-quarter the weight, one-quarter to each one of these strings

And clearly that, the sum of all the weights sums up to one. Well again, what

this means is that if I sample at random from this distribution, I'll get a uniform

sample across all our 2-bit strings So all of these 4-bit strings are equally likely

to be sampled by this distribution. Another distribution that's very common is

what's called a point distribution at the point X0 And what this point distribution

does is basically it puts all the weight on a single point, namely X0. So here we

assign to the point X0 all the weight, one And then to all other points in the

universe, we assign the weight zero And by the way, I want to point out that this,

inverted, A here should be read as, for all. So all this says is, that for all X

that are not equal to X zero, the probability of that X is zero. So again

going back to our example a point distribution for example, that would put

all its mass on the string 1-0, would assign probability one to the string 1-0

and zero to all other strings. So now if I sample from this distribution pretty much

I'm always guaranteed to always sample the string 1-0 and never sample any of the

other strings. So now we know what a distribution is, and I just want to make

one last point, and that is that because this universe U is always gonna be a

finite set up for us, we can actually write down the weights that the

distribution assigns to every element in U, and represent the entire distribution

as a vector. So, here for example, if you look at the universe of an all 3-bit

strings, we can literally write down the ways that the distribution assigns to the

string 000, then the way that distribution assigns to the string 001 And so on, and

so forth. We you can see that we can write this as a vector, in this case it will be

a vector of dimension eight, there will be, there are eight strings of 3-bits as a

result basically the entire distribution is captured by this vector of eight real

numbers, in the range of all zero to one. The next thing I wanna do is define the

concept of an event. So consider a subset A of our universe And I, I'll define the

probability of the subsets to be simply the sum of the weights of all the elements

in the set A. In other words, I'm summing over all X and A, the weights of these

elements X in the set A, Now because the sum over the entire universe of all

weights needs to be one. This means that if we sum, well if you look at the

probability of the entire universe, basically we get one. And if we look at

the probability of a subset of the universe, we're gonna get some number in

the interval zero to one And we say that the probability of this set A, is the sum

which is a number between zero and one. And I'll tell you that a subset A of the

universe is called an event. And the probability of the set A is called the

probability of that event. So let's look at a simple example. So suppose we look at

the universe u, which consists of all 8-bit strings, right? So the size of this

universe u is 256 because there are 256 8-bit strings. Essentially we're looking

at all byte values, all 256 possible byte values. Now let's define the following

event. Basically the event is gonna contain all bytes so all [inaudible]

extremes in the universe such that the two least significant bits of the byte happens

to be eleven So for example, if we look at 01011010 that's an element in the universe

that's not in the set A, but if we choose a zero to a one. Then that's an element of

the universe which gives in our set A. And now let's look at the uniform distribution

over the universe U and let me ask you what is the probability of the, of the

event A? So what is the probability that when we choose a random byte, the two

least significant bits of that byte happens to be one, one? Well the answer is

one-fourth, and the reason that's true is because it's not too difficult to convince

yourself that of the 256 eight bit strings, exactly 64 of them, one quarter

of them, end in one, one. And the probability of each string is, we're

looking at the uniform distribution or probability of each string is exactly one

over the size of the universe, mainly one over 256. And the product of these, you

know, 64 elements, each one has weight one over 256 is exactly one-fourth, which is

the probability of the event A that we're looking at. So a very simple bound on the

probability of events is called the union bound. So imagine we have two events a1

and a2. So these are both subsets of some universe U Snd we wanna know what is the

probability that either A1 occurs, or A2 occurs In other words, what is the

probability of the union of these two events? This little U here denotes the

union of the two sets. So the union bound tells us that the probability that either

A1 occurs or A2 occurs is basically less than the sum of the two probabilities. And

that's actually quite easy to see. So simply look at this picture here, you can

see that when we look at, at the sum of the two probabilities, we're basically

summing the probability of all the elements in A1, all the elements in A2 And

you realized, we kind of double-summed these elements in the intersec tion. They

get summed twice here on the right hand side And as a result, the sum of the two

probabilities is going to be larger or larger than or equal to, the actual

probability of the union of A1 and A2. So that's the classic union bound And in fact

I'll tell you that if the two events are disjoint, in other words they're

intersection is empty, in that case if we look at the sum, at the probability that

either A-1 or A02 happens, that's exactly equal to the sum of the two probabilities.

Okay? So we'll use these facts here and there throughout the course. So just to be

clear, the inequality holds always But when the two events are disjoint, then in

fact we get an equality over here. So let's look at a simple example. Suppose

our, event A1 is the set of all n-bit stings that happen to end in 1-1 And

suppose A2 is the set of all n-bit strings that happen to begin with 1-1. Okay, so N

thinks of it as H or some large number and I'm asking that what is the probability

that either a one happens or a two happens, In other words if I sample

uniformly from the universe U, what is the probability that either the least

significant bits are one, one or the most significant digits are one, one. Well as

we said that's basically the probability of the union of A1 and A2. We know that

the probability of each one of these events is one-quarter by what we just did

previous slide And therefore by the union [inaudible] the probability of the

[inaudible]. Is, you know, a quarter of the probability of A1, plus the

probability of A2, which is a quarter plus a quarter. And we just proved that the

probability of seeing 1-1 in the most significant bit, or 1-1 least significant

bit, is less than one-half. So that's a simple example of how we might go about

using the Union Bound to bound the probability that one of two events might

happen. The next concept we need to define, is what's called a random

variable. Now, random variables are fairly intuitive objects. But unfortunately the

formal definition of a random variable can be a little c onfusing. So what I'll do

is, I'll give an example, and hopefully that will be clear enough. So formally, a

random variable denoted say, by X. Is a function, from the universe into some set

V And we say that this set V is where the random variable takes its values. So let's

look at a particular example. So suppose we have a random variable x And this

random variable maps into the set 01. So the values of this random variable are

going to be either zero or one. So, one bit, basically. Now, this random variable

maps our universe, which is the center of all end bit binary strings, 01 to the end

And how does it do it? Well, given a particular sample in the universe, a

particular end-bit string y. What the random variable will do is simply output

the least significant bit of y And that's it. That's the whole random variable. So,

now let me ask you. Suppose we look at a uniform distribution on the set zero one

to the end. Let me ask you what is the probability that this random variable

output zero and what is the probability that a random variable outputs one? Well

you can see that the answers are half and half. Well let's just lead them through

why that's the case. So here we have a picture showing the universe and the

possible alpha space. And so in this case the variable can output a zero or a one.

When there is a variable output zero, there is a variable output zero when the

sample in the universe happens to be, to have its least inefficient [inaudible] bid

be set to zero. In variable one, output one When the sample in the universe

happens to have its least significant bit set to one. Well, if choose strings

uniformly at random, the probability that we choose a string that has its least

significant bits set to zero is exactly one half Which the random variable output

zero with a probability of exactly one-half. Similarly, if we choose a random

end-bit string, the probability that the least significant bit is equal to one is

also one-half. And so we say that the random variable output's one, also with

exactly proba bility of one-half. Now, more generally, if we have a random

variable taking values in a certain set v, then this random variable actually induces

a distribution on this set v. And here, I just wrote a, kind of a, in symbols, what

this distribution means But it's actually very easy to explain. Essentially, what it

says is that the variable outputs v Basically, with the same probability that

if we sample a random element in the universe, and, and then we apply the

function x. We ask, how likely is it that the output is actually=to v? So formally

we say that the probability that X outputs V, is the same as the probability of the

event That when we sample a random element in the universe, we fall into the pre

image of V under the function X And again, if this wasn't clear, it's not that

important. All you need to know is that a random variable takes values in a

particular set V, and in, induces a distribution on that set V. Now there's a

particularly important random variable called a uniform random variable. And it's

basically defined as you would expect. So let's say that U is some fine [inaudible]

set For example the set of all N bit binary strings, and we're gonna denote a

random variable R that's basically sample's uniformly from the set U by this

little funny arrow with a little R on top of it. In this, again the notes that the

random variable R is literally a uniform random variable sampled over the set U. So

in symbols what's this means is that for all elements A in the universe, the

probability that R is equal to A is simply R one over U. And if you want to stick to

the formal definition of a, of a uniform variable, it's not actually that important

But I would just say that formally the uniform random variable is an identity

function namely R [inaudible] is equal to X for all X in the universe So just to see

that this is clear, let me ask you a simple puzzle. Suppose we have a uniform

random variable over 2-bit strings, so over the set, 01, ten, one and now, let's

define a new random variable X to basicall y sum the first and second bits of R. That

is, X simply is the sum of R1 and R2, the first and second bits of R, treating those

bits as integers. So, for example, if, r happens to be 00, then x will be zero+0,

which is zero. So let me ask you, what is the probability that x is = to two? So

it's not difficult to see that the answer is exactly, one-fourth because, basically

the only way that x is equal to two is if r happens to be 1,1 but the probability

that r is equal to 1,1 is basically one-fourth because r is uniform over the

set of all two bit stings. The last concept I want to define in this segment

is what's called a randomized algorithm. So I'm sure you're all familiar with

deterministic algorithms. These are algorithms that basically take a

particular, input data, as input, and they always produce the same output, say Y. So

if we run the algorithm a hundred times on the same input, we'll always get the same

output. So you can think of a deterministic algorithm as a function that

given a particular input data, M, will always produce exactly the same output, A

of M. A randomized algorithm is a little different, in that, as before, it takes

the [inaudible] and as input, but it also has an implicit argument called R, where

this R is sampled anew every time the algorithm is run. And in particular this R

is sampled uniformly at random from the set of all N-bit strings, for some

arbitrary end. Now what happens is every time we run the algorithm on a particular

input M we're gonna get a different output because a different R is generated every

time. So the first time we run the algorithm we get one output. The second

time we run the algorithm a new R is generated and we get a different output.

The third time we run the algorithm a new R is generated and we get a third output

and so on. So really the way to think about a randomized algorithm is it's

actually defining a random variable. Right? So given a particular input

message, M, it's defining a random variable which is, defining a distribution

over the set of a [laugh] possible outputs of this algorithm, given the input, M. So

the thing to remember is that the output of a randomized algorithm changes every

time you run it And in fact, the algorithm defines a distribution and the set of all

possible outputs. So let's look at a particular example. So suppose we have a

randomized algorithm that takes as input a message M And of course it also takes an

implicate input which is this random string that is used to randomize its

operation. So now what the algorithm will do is simply will encrypt the message M

using the random string as input. So this basically defines a random variable. This

random variable takes values that are encryptions of the message M And really

what this random, random variable is it's a distribution over the set of all

possible encryptions of the message M under a uniform key. So the main point to

remember is that even though the inputs to a randomized algorithm might always be the

same every time you run the randomized algorithm you're gonna get a different

output. Okay So, that concludes this segment, and we'll see a bit more discrete

probability in the next segment.