Hello and welcome back to this course. So far in this course, we've been talking about ways to gain access to passwords on a compromised machine. There's a variety of different ways to do that, which might result in a rather significant list of passwords for a user. If we dump passwords from the browser, from the operating system, and maybe even collect some through a phishing attack, then we've got a fairly long list of passwords that a particular user has chosen. Now what we're going to talk about is trying to determine the strength of those passwords and determine if there are any patterns we can take advantage of in our attack. For example, if you have a algorithm where you use a base password and then append something to it that's unique to a particular site if we can identify that algorithm, it'll make it a lot easier to identify new passwords for different sites. We're going to talk through a few different ways to analyze our passwords in this video using the Python file here on the left. Scrolling down to the bottom, we've got a sample list of weak passwords that we're going to look at here, just for the purposes of our examples. We've got the password, always ahead, and then a few here that are designed to align with a particular site. For example, for Facebook, someone might have the algorithm use a base password, Password, and then prepend or append FB for Facebook. Similarly, LI for LinkedIn, GM for Gmail, etc. If we can determine that that's how those passwords work, they've got a root, and they append or prepend something, then we can guess passwords for something new. For example, if you've got a Microsoft product, maybe it's MSPassword, etc. In that case, we've got a password that's weak; password. But what if you think, oh well, if I use a strong base password, that's going to be better. For example, we have a nice strong random password here using alphanumeric, lowercase, capital letters, special symbols, etc, and here we've got one that uses that as a root. So that underscore FB for Facebook. Again, if you've got that same root, then we might be able to identify the algorithm and if we've got the root, then its complexity doesn't really add anything to security. In this video, we're going to use the analyze passwords function to take a look at this list of passwords and see if we can find any common trends in them or other things that we can exploit. Here is that analyze passwords function. What we're going to do here is go through each password in our list of passwords and perform a couple of different types of analysis. We're going to start by calculating the entropy of the password. We've talked about entropy in an earlier course. In fact, the function that we're going to use for calculating entropy is pretty much identical to that. Just as a quick recap, entropy is a measure of the information contained within a collection of data or the amount of randomness in it. The higher the entropy, the better the password essentially in this particular case because it means it has a high level of randomness. If we've got a password that we're passing in, it's really a collection of characters. We have P-A-S-S-W-O-R-D. In theory, this password is going to have a lower entropy than say this one here because the characters and password are much more predictable than aHC[_'5Y<f. Much less predictable. How we're going to calculate this entropy is by using a couple of functions from the pandas and psi PyPI libraries. Taking our password, we can convert it from a string into a byte array. A byte array, as its name suggests, is an array of bytes. Each character is going to be a separate item in this array or list. Then we're going to treat this as a series in panda. The reason why we're using pandas series is because then we can call value counts, which will tell us the number of instances of each letter that shows up. For example, password, we're going to have one of everything except for the S, where there's two. Then with these value counts, we have a rough guess of the observed probability of seeing each of those values in the password. We'll say, "Oh, there's a slightly higher probability of an S than anything else, because we saw more of them." When we do that entropy calculation, we're trying to see what our observed probabilities are versus the theoretical probabilities, which if we've got a completely random password, should be one over the total number of potential characters, capital letters, lowercase letters, numbers, and then however many special characters you are using in this particular case. In general, I'd say that that's roughly in the area of 70 or 80, so a one and 80th chance. In practice, we'll see that, for example, the S appears more than that prediction should say because if you've got a eight character password and a one and 80th chance, the odds of having two of the same thing are rather low. By evaluating the entropy of these passwords, we can get a guess on whether or not we think they're probably random without having a human sit there and look, password is a word that's not random, etc. Another way that we could calculate the strength of the password in this way other than entropy would be to compare the passwords to a list of the most common passwords that shows up on the list. It's by definition a bad password. That's our individual measure of this password strength, which is useful for determining if it's got low entropy and all of their passwords have low entropy. This might be a good account to target because we've got a better chance of guessing their password. We're also going to look for the potential that we have passwords that are reusing components. How we have password showing up in these three passwords, and then the same random root in both of these passwords. Doing so is computationally expensive to be honest. This is called the longest common sub string problem and it's got a high complexity. But we can take a couple of steps to work down the complexity on this. The first thing that we're going to do is take a look at the hamming distance between the two passwords. This is useful because it allows us to identify if multiple passwords have the same substrings in the same places. For example, if we look at Facebook password, LinkedIn password, and GM password, all three of these have a large common substring in the exact same place. That gives us the indication that we're probably working with an algorithm here. The hamming distance is a fast and easy way to determine if we have those common substrings. All the hamming distance is a calculation of the number of characters that differ between two strings. Say if we have password, or say let's do Facebook password and LinkedIn password here, they differ in only two characters, the first and the second, because everything else is identical. This is going to have a very low hamming distance, while say password versus this has a very high hamming distance because there are no identical characters. How we calculate the password is fairly simple. We iterate over the length of the password. If the characters at two locations don't match, we add one to that hamming distance value and then we return the distance. We're going to be using this to shortcut the process of checking for those common substrings in the case where we have things that have a high hamming distance. If that hamming distance is less than the length of the password, we know that it's got common characters in some location. We have something like the same root in the same location. If so, we're going to call our check substrings function to look for those particular cases. If we get any common substrings, we're going to add them to a variable that holds a list of matches. This will say, here's the other password that we found, the common substring for and here is the list of substrings that match in that password. Then in the end, our return value will be this dictionary P and it's going to say for this particular password, here is this entropy, and here is that list of matches of other passwords where it lines up a little bit. Now let's give this code a run to see what it does. Python analyze passwords dot py, give it a second to run through. Then here's the results that we're printing out. We're just printing it down here in this loop, we're going to print out the password that we've analyzed, its entropy. We've calculated that up here in our calcEntropy function. Then the list of matches if we found any. Here, the result for password shows one of the limitations of using this calc hamming distance function to shortcut the check substrings process. Because this password, which is the root for three of the other passwords, doesn't show up as having any matches and that's because it doesn't have the sync character in the same location as any other password because it's prepended to it. If we commented out this if calc hamming distance function will get different results. We'll take a look at that in just a moment. But first let's look at the rest of the results here. Let's start with entropies. Entropy is a measure of the amount of randomness in a particular password. We see a very low entropy here for password because there's not a lot of randomness in it. It's seeing that, we've got repeated characters, etc. Also this is shorter than the rest of the passwords, so length matters here. We see that to an extent when we go to say FB password. Most of it is the exact same as password, but we get a little bit more entropy, partially because it's longer and partially because we're adding characters we haven't seen before the F and the B. Notice that our entropy value is the same for all three of these because we've got two unique characters and then the word password, and so they all have that same level of surprise in them. Then down here I have a couple of other passwords. Here's that more random stem that we're using. It's got a higher entropy than any of these, even though it's the same length, because all the characters are different. Then finally here we've got one where we've added that F_FB to the end, and we see we have even higher entropy in this case, because we've got a longer one, only one repetition that shared underscore. This shows up as stronger and from an entropy perspective, because it's longer and it looks very random. But if we know this, then guessing this is easier. Now let's talk about that and match section. For our matches, if we go to Facebook password, we see we get three heads. The reason for this is that when we have the word password, it's a substring within Facebook password. For LinkedIn password, it shares that same stack or root password. All of these have password in them, and this value here of two shows where this common substring shows up. Location zero was an F, location 1 is the B, and then our common substring here, our password. Going down here, we also get matches. The reason for that is because this random root is re-used in this password. Say if we had another password, maybe this random string underscore on Gmail or something like that. If we didn't have this, we'd still detect that random shared root, which could lead us guess the algorithm that was used to generate these passwords. This demonstrates how we can use a little bit of analysis to determine the strength of a password and even potentially to calculate that secret algorithm that someone's using to strengthen those passwords. If we've got two of their passwords, even if you've got that strong stem, if they've got that algorithm that's predictable in some way, we can learn that stem and then easily generate prefixes, or in this case a suffix, for that particular password for a new account that we haven't observed. This is demonstrating the use of Python and password analysis to help us gain access to different credentials and possibly to additional accounts. Thank you.