In this lesson, which is in two parts, we're going to look at the three types of attacks that can be carried out against the Random Oracle described in a prior lesson. In the first part, we'll look at preimage attack and the second preimage attack, while in the second part, we'll focus on the hash collision attack, better known as a birthday attack. There are certainly other attacks against real hash functions, but they exploit vulnerabilities in either a specific hash function or, like the length extension attack, in broad families of hash functions. Those are dealt with in other courses where specific hash function designs are discussed. Recall that the size of a hash function is the number of the bits in the digest it produces. If a hash function has N-bits, then the number of unique digests it can produce is 2 to the N. Thus the number of possible digests grows exponentially with the size of the digest. The difficulty of any attack that cannot exploit specific weaknesses in the algorithm or implementation scales with the number of possible digests. And are thus exponentially difficult in terms of the size of the hash function. To give this some context, there are about 10 to the 80th atoms in the known universe, which is about 2 to the 265th power. If we could use a single atom per bit than enumerating all of the digests in a 256-bit hash function, would require about half the mass of the known universe. It's no exaggeration when we say we are talking about astronomically large problems. Let's go the other way, how large of a hash function could we enumerate if we only had available the atoms in a grain of sand? A rough estimate of the number of atoms we're talking about is 10 to the 20th power. This is about enough to handle a 60-bit hash function. A 64-bit hash function, which is just now coming into the range of modern computing technology to enumerate, using highly distributed parallel computing, would only require the atoms from about a dozen grains of sand to enumerate. So we can just now deal with a few grains of sand, but even modestly sized modern hash functions require we deal with most of the known universe. Yet another way to get a feel for these numbers is that if we could enumerate 1 million digests a second, it would take over half a million years to work our way through all of the possible 64-bit digests, which gives you some idea of the size of the distributed computing network required to tackle a task like this in the span of a year of so. We'll use these kind of examples as we consider the implications of these attacks, because it is difficult for the human mind to grasp the sheer scale we are talking about. In a preimage attack, Mallory knows the value of a particular digest and is tasked with attempting to find just one of the effectively infinite number of messages that produce that digest. Since there are infinitely many, it might seem that they should be pretty easy. But a Random Oracle or, to a good approximation, a well designed cryptographic hash function, forces Mallory to just keep guessing random messages until he finally happens upon one that coincidentally matches the digest he is seeking, making this a very challenging task. If there are D possible digests, and they each occur with equal probability, then the chance of any one message producing the digest we are looking for is 1 divided by D. The likelihood that we test k messages, and none of them are the one we are looking for, is the probability that a given digest isn't the one we want multiplied by itself k times. So how big does k have to be before we ave a 50/50 chance of getting a hit? Solving for k using an approximation for the natural log of a number close to 1, we get that k has to be on the order of the number possible digests, which is exponential in the size of the hash. As already mentioned, this type of attack against a 64-bit hash function is just barely coming into the range of feasibility for well-funded entities. It's estimated that it currently requires about $100,000 of computer time to find a single solution. Since most cryptographic hash functions are currently at least 128 bits, and most are 256 bits or more, preimage attacks on high quality cryptographic hash functions of typical size are computationally infeasible. We know that with a 256-bit hash, we are talking about a number of digests that is on the order of 1% of the number of atoms in the known universe. Even 16- bit hash function like SHA-1 has a number digests equal to a few percent of all the atoms in the planet earth. But we can certainly construct hash functions that are even bigger than this that are trivially easy to perform this kind of attack on. Remember a preimage attack only defines the goal. There is nothing that requires us to guess random messages, if we can find a better way. For instance, consider a hash function that only accepts messages that are 1,024 bits long and the hash is simply equal to the message. This may sound silly, but there are non-cryptographic hash functions that work exactly this way. For instance, in programming languages such as Java, which supports 32-bit hash values, usually used for non-adversarial integrity checks or for implementing hash tables, the hash of a variable of type int, which is a 32-bit integer, is simply the value of the integer. This approach not only provides an extremely fast way of computing the hash, but it also results in what is known as perfect hashing, meaning that hash collisions simply are not possible. This is a valuable property for some applications. But as a cryptographic hash function, this would be about the worst possible hash function, because if Mallory is given the digest, he can not only find some message that hashes to that digest. He can tell you the exact message that was actually used with certainty. The two key points to take away from this are that just having a lot of bits in the function does not automatically imply that it is resistant to a preimage attack. And the same is true for the other attacks we'll look at. Second, given that we know that we can build a highly insecure hash function, we have to expect that designing a real hash function that really does approach the quality of a Random Oracle is not going to be a trivial matter. In fact, teams of designers spend literally years and millions of dollars constructing these hash functions. And many promising variants from many teams are usually found to have subtle, but fatal weaknesses before a new hash function is deemed suitable. Even tried and true hash functions often eventually fall to some clever and highly sophisticated attack, decades after it became an international standard. SHA-1 is just one such example. Next, let's just look at the second preimage attack. Remember, in this case Mallory has a message and needs to find another message that has the same digest. On the surface this seems like it always reduces to a preimage attack since Mallory can simply take the digest of the known message. Remember, it's always assumed that everyone has access to the same hash function. And then perform a preimage attack on that digest. So what is the computational effort involved against a random oracle? It's the same as the preimage attack, namely order 2 to the N, because the only additional work is computing a single digest. And that is generally considered to take constant time. Most of the time, this reduction to a preimage attack is the case in practice, but most of the time is not all of the time. Consider our perfect hash function, the case of hashing an int in Java. While carrying out a preimage attack on it is trivial, it is completely immune to a second preimage attack, because there are no two messages that have the same digests. This may seem nitpicking, but there is actually an important practical lesson here. The fact that these differences can exist probably means that, at least to some small degree, they will exist in any real hash function. And if they do exist, we have to assume that eventually our adversaries will come across them and figure out how to exploit them. Because of subtle technical distinctions like this, we need to treat these two attacks as separate attacks when designing and analyzing a real hash function. And part two of this lesson, we will continue with the hash collision attack.