0:00

One of the things that we often need to do in

Â cryptography is find large prime numbers. How large?

Â Currently, RSA encryption commonly uses 1024 bit public keys.

Â But it is increasingly recommended to use

Â 2048 bit keys and some applications use 4096 bit keys.

Â These are numbers that have about 300,

Â 600 and 1200 decimal digits, respectively.

Â Since these keys are semi-prime numbers,

Â which are numbers that are the products of two primes,

Â and since the two prime numbers used are roughly the same size.

Â This means that we are talking about prime numbers that have approximately 150,

Â 300 and 600 digits, respectively.

Â So the good guys, such as Alice and Bob,

Â have to be able to find large prime numbers to generate their public keys,

Â while the bad guys, Eve and Mal,

Â want to be able to figure out which prime numbers

Â Alice and Bob chose based on that public key.

Â Although both the good guys and the bad guys are

Â interested in finding prime numbers of about the same size,

Â their problems are actually very different.

Â In some respects, the differences are roughly analogous to those between

Â a second pre-image attack against the hash function and a hash collision attack.

Â Except this time, the tables are turned in favor of the good guys.

Â Eve and Mal have to find a specific prime number that factors a particular semi-prime,

Â while Alice and Bob only care about finding a random prime number of a certain size.

Â Alice and Bob can either use an algorithm that generates prime numbers or take

Â the approach of just guessing random numbers and

Â checking to see if they are prime until they find two that are.

Â In this case, what they need is an algorithm that tests if a number is prime,

Â without actually factoring it.

Â It turns out there are a number of efficient ways to do

Â this and those will be the focus of the next several lessons.

Â Eve and Mal, on the other hand,

Â need to solve what is known as the Integer Factorization Problem.

Â Like the discrete log problem,

Â this is a problem for which it is suspected that there

Â may not be any efficient algorithm to solve it.

Â As before, the term efficient is shorthand for a polynomial time algorithm,

Â in terms of the number of bits in the number.

Â But again, like the discrete log problem,

Â it has never been proven that an efficient solution does not exist.

Â So there is the potential that any cryptosystem that relies on this

Â being a hard problem may be relying on a house of cards.

Â In this lesson, we want to get a feel for the scope of

Â the Integer Factorization Problem by looking at how the simplest factoring algorithm,

Â known as trial division,

Â scales, and briefly talking about the performance of the general Number Field Sieve,

Â which is presently the fastest known algorithm for

Â factoring multi-hundred digit semi-primes.

Â Let's start small and first narrow the scope of

Â the problem to something that we can wrap our minds around.

Â Let's say that Alice wants to generate a public key that is just 32 bits.

Â This means that she wants prime numbers that are about 16 bits in length.

Â Since she doesn't want the two primes numbers to be too close to each other,

Â she decides to make one 14 bits and other 18 bits.

Â This means that she wants a prime number close to 16000 and another one close to 260000.

Â It turns out that 16001 and 260003 are prime.

Â So Alice's public key becomes 4160308003 which is indeed a 32 bit number.

Â How did I find these numbers?

Â Simple. I cheated.

Â I looked up a list of prime numbers and scanned down it.

Â These happened to be the 1863rd and the 22838th primes on the list.

Â This is not how Alice would do it when she wants to find even a 150-digit prime number,

Â let alone a 600-digit prime.

Â Such a table has to include all of the prime numbers up to a certain limit.

Â While the largest prime number we know to date has over 22 million digits,

Â it is estimated that the limit below which all of the prime numbers have been

Â found is somewhere between 10 to the 18th and 10 to the 19th,

Â or basically an 18-digit number.

Â But no actual table exists because

Â even that table would require about two exabytes of storage,

Â and there is presently somewhere in the rough neighborhood of

Â a thousand exabytes of storage in the entire world.

Â So how does Alice do it?

Â The most common way is to pick random numbers and test if they are prime.

Â It turns out that the probability

Â the randomly chosen prime number is roughly

Â the reciprocal of the natural log of the number.

Â So in the case of our toy problem,

Â a randomly chosen number around 16000 has about a 10% chance of being

Â prime while a number around 260000 has about an 8% chance.

Â But what about a 150-digit prime or a 600-digit prime?

Â Certainly, the probabilities go down but we are still talking about

Â roughly one in 350 for 150-digit prime,

Â and about 1 in 1400 for a 600-digit prime.

Â So yes, she may have to make

Â several hundred or even a few thousand guesses before she finds two primes,

Â but she only has to do that once.

Â And as long as the Primality test is sufficiently efficient,

Â this is an acceptable burden.

Â That's all I will say about Alice's task here since we'll look

Â at efficient Primality tests in the next few lessons.

Â Turning our attention to Eve,

Â what does she do when she gets handed in

Â this 32-bit toy semi-prime and is told to factor it?

Â Her first approach might be what I call True Brute Force Trial Division.

Â Let's see if two will divide it,

Â if not she tries three if not tries four,

Â tries five, all the way up to N minus one.

Â This is absolutely guaranteed to find any divisor of the number and

Â the trial divisions that she has to perform is basically

Â N. So we have an order N algorithm,

Â but since N is exponential in the number of bits,

Â so is this algorithm.

Â But we can do a lot better.

Â Brute force merely gives us a benchmark for comparison.

Â The first thing to note is that if N has any factors,

Â they obviously have to be less than N over two which cuts our effort in half.

Â But this is still an order N algorithm.

Â But we really only need to try potential divisors until we get up to

Â the largest value that the smallest factor could possibly be.

Â Let's say that the smallest factor of N is p. We know that p is prime since if it wasn't,

Â there would be a factor smaller than p. The other factor, q,

Â which may or may not be prime,

Â is no smaller than p. Again,

Â if it was, p wouldn't be the smallest factor.

Â So we have N equals p times q which is at least as large as p_squared.

Â Which means that p, the smallest factor of N,

Â is no larger than the square root of N. This might not seem like that big of

Â an improvement but even considering

Â just our 32-bit factor puts the lie to this impression.

Â If we tried every number less than half of N,

Â we would be making up to 2 billion trial divisions.

Â But knowing that we can stop at the square root of N means that we

Â have to do at most about 16 or 65000,

Â which is a factor of 65000 reduction in the time required.

Â So if it took us a day to perform 2 billion trial divisions,

Â which admittedly is an over-exaggeration given today's computers,

Â but it's just for illustration,

Â stopping at the square root of N,

Â would only take about a second.

Â Now, I realize that you might already be screaming at me that Eve doesn't even need to

Â check every number less than the square root of

Â N. Once she knows it isn't divisible by 2,

Â she doesn't need to check any other even number.

Â That alone cuts her work in half.

Â In fact, she only needs to try the prime numbers

Â less than square root of N. In the case of our toy N,

Â the square root of N is just over 64500 and there are 6493 prime numbers less than this.

Â How do I know this?

Â I cheated again. I used the same table I did before.

Â But even with this list available,

Â Eve's burden is only reduced by an order of magnitude,

Â and that isn't going to save her.

Â For starters, the list she uses would have

Â to contain all of the prime numbers below the square root of

Â N and we've already seen that

Â the current limit on such a list is less than 10 to the 19th,

Â which means she won't be able to use this method to

Â factor any number more than about 38 digits.

Â Plus, this table would require

Â a non-trivial fraction of the entire world's storage capacity.

Â Now, consider how many prime numbers would be in this table.

Â There is actually a function called the Prime Counting Function, or pi of N,

Â that is defined as the number of prime numbers less than N. Unfortunately,

Â unlike the Totient function,

Â we have yet to figure out the actual function that will give us the actual value of

Â pi of N. That's another of mass great unsolved problems.

Â But we do have some limits on it and one commonly used limit is,

Â that is often good enough,

Â is that it is about N divided by the natural log of N. So for 10 to the 19th,

Â that's about 2.3 times 10 to the 17th.

Â So even if Eve could perform a billion trial divisions a second,

Â it would still take her about seven years to factor a 38-digit number.

Â Clearly, this approach has no chance of scaling 200,

Â let alone, 1000 digit numbers.

Â Our old friend astronomical pays a visit long before we get there.

Â But trial division is not the only way to factor large numbers.

Â Most fast methods, and here fast does not mean polynomial time,

Â it just means faster than anyone else has come up with yet,

Â they use some form of Sieve approach.

Â You might want to look up the Sieve of Eratosthenes

Â which gives you an idea of the concept and which,

Â like the Euclidean Algorithm,

Â is one of the oldest algorithms known.

Â For numbers with more than about 100 digits,

Â the current record holder is known as the general Number Field Sieve.

Â The RSA factoring challenge was created by RSA laboratories in 1991.

Â And yes, that's the same RSA as the encryption algorithm we

Â use for so much of our e-commerce and other public key cryptoprotocols.

Â It contains a list of semi-prime numbers ranging from 100 decimal digits

Â up to 617 digits which is a 4096-bit number.

Â As of 2017, the largest RSA number that has been factored is RSA 768,

Â a 768-bit number, which has 232 decimal digits.

Â Factored back in 2009,

Â it took a couple of years and involved hundreds of processors.

Â It is estimated that it would take about 2000 years

Â on a single core 2.2 Gigahertz processor.

Â A couple of smaller RSA numbers still haven't been factored.

Â To give you some idea of how processing capabilities have progressed,

Â the original RSA key length for RSA in the late 1970s was 512 bits.

Â As is often the case in cryptography,

Â the desired key length was longer than this but the choice of

Â 512 gets reflected practical contemporary implementation constraints.

Â The first 512-bit key factor, RSA 155,

Â involved a dedicated research effort,

Â a close to state-of-the-art Cray supercomputer,

Â and 8000 MIPS years of processing effort.

Â A decade later, the 512-bit RSA key used by Texas Instruments designed

Â firmware for the TI-83+ graphing calculator was broken by a single person,

Â Benjamin Moody, in 73 days using a single 1.9 Gigahertz dual core processor.

Â Some experts suspect that current 1024-bit keys can

Â already be factored in meaningful timeframes by well-funded state actors,

Â while others dispute this claim.

Â But few doubt that it will be possible in the not too distant future.

Â For this reason, the current recommendation is to use those 2048-bit keys.

Â It is believed that short of someone developing

Â an efficient algorithm or a workable quantum computer,

Â that 4096-bit keys move the scale of the factoring problem

Â well into the astronomical realm the cryptographers like to play in.

Â Assuming that we keep the key size ahead of advances in

Â computing technology and the development of factoring algorithms,

Â what are Eve's other options?

Â It's pretty safe to assume that just because we may have found a way

Â to keep Eve and Mal from cracking our key mathematically,

Â it would be pretty unreasonable to think that they're just going to give up and go away.

Â Nope, they're going to explore every facet of

Â our cryptosystems trying to find other ways to break into them.

Â In the case of RSA and Diffie-Hellman,

Â they might be able to make headway against the discrete log problem.

Â More likely, they might discover weaknesses not in

Â the basic math but in exploitable weaknesses due to subtleties in how we use that math,

Â such as how we generate our keys,

Â how we pad our messages,

Â or how we use our keys.

Â In fact, all of these have been the source of

Â exploitable vulnerabilities that have been exposed and patched over the years.

Â Then, of course, there are all the side channel attacks ranging from

Â monitoring how our physical cryptosystems leak information in a variety of ways,

Â or going after human weaknesses such as storing keys

Â improperly or being susceptible to bribery or coercion.

Â We are not going to explore any of these any further.

Â Our focus is on the central part of the underlying math.

Â But engineers are notorious for putting on

Â blinders and thinking that if they solve the narrow problem we're focused on,

Â that the bigger problem is somehow solved.

Â So I want to keep us grounded in

Â that bigger picture by at least pointing it out from time to time.

Â