Hello and welcome back to this course. In this video, we're going to talk about service identification. In some of the previous videos, we've collected a lot of information from various different sources. We talked about DNS and identifying subdomains for a particular IP address. We talked about using showdown to collect information about a particular system that's connected to the Internet. We talked about using scanning, both port scanning and banner grabbing to collect information about a target system. At the end of all of this, we have little bits and pieces of information, but not what we actually need for vulnerability identification, and move on to the next stage of our attack. In some of the previous videos, we saw some of the sources of data that we receive. We looked at HTTP headers, we looked at the banners that we grabbed from an FTP service, etc. Often what we need, that product number inversion is in there, but we haven't pulled it out into a usable form. That's what we're going to be doing here. Our Python program this time is called ServiceLookup.py. This program is going to interact with a lot of our other programs and pull that information that we've looked at in previous videos, and then parse it to see if we can pull out something useful. Let's go down to our main function to start down here at the bottom. We've got a main function here that again, is only going to show up in this particular video. We'll get rid of this when we're linking our different programs together to create our more complete reconnaissance infrastructure. In this case, we're going to call a function called serviceID. We're going to pass in IP address, and then, a list of subdomains that we get from our DNS searching before. In this particular case, we're not going to provide any subdomains because we're going to pretend like we don't know much of anything about this target system. In response, we'll get something called records and we'll loop over those records, printing them out. Let's find out what we're going to learn here. We'll move up to service ID here. The goal here is to take advantage of all of the different functions and programs that we've looked at in previous videos to pull in information about a particular IP address. We're going to look at the IP address in three different ways. We're going to start by taking advantage of what we've learned from our DNS infrastructure. Recall that in the DNS, we took a base domain and identified the subdomains associated with it. From those subdomains, we can make guesses about what a particular computer's designed to do. It's got a mail.example.com is a good chance that we might want to look at and SMTP port for it because there's a chance it's running that. Or potentially a web port like 80 or 443 for a web-based email, etc. If we've got something like www.example.com, we definitely want to look at ports 80 and 443, etc. We've got a few example defaults up here, just to look at SMTP runs on port 25, DNS runs on 53, if our subdomain's NS, that's also DNS port 53. We've got a few different options here for things that would run off of ports 80 and 443. Web.something, www.something, or api.something, it's going to use those standard web ports. Then also FTP runs on ports 20 and 21. With that information in mind, we can start to target some of our information gathering. Based off the DNS, we know what it probably does. Based off of some of these defaults and others we could set as well, we can identify what ports there's something probably running at. But then we actually need to get information about what's actually running on those ports. Because knowing that, oh, there's probably something listening at port 25, isn't quite enough for us. What we're going to do here is take advantage of this list of default ports by taking that set of subdomains we're passing in, strip off the numbers in case we have something like NS1, and then loop over that list of default ports, like ports 80 and 443. We're going to call bannerRecord of ip, p for each of those ports and that default. If we have defaults, we'll loop over them and call this other function. Here we've got that other function. The end goal of this function is to identify a product and a version number for the service running at that location. If we've got a port and ports 80 and 443, 8080 or 8443. So those are our web ports probably. Then we're going to use the HTTP header grab function we've looked at previously. Remember, that's going to pull the HTTP headers, and in those headers, we pointed out that there's a header called server. If we find that server header, we pull that out and we're going to pass that to a parse banner function that'll try to extract information about the program and the version running there. Otherwise, we're going to call banner graph. Remember that just calls out to that port and listens for anything that it sends back. If it sends something back, we're going to call parse banner with that to see if we can get any useful information out of it either. Then, in the end, we're going to create a record that says the port number, the product, and the version number. If we've got something for that, it could be that at the end of this we have nothing for the product number and nothing product name and nothing for the version number. But at least we tried. I'm going to come back to this parse banner function in just a moment. But let's look at our other two options here. Another option here is to check showdown. We looked at showdown earlier video. We saw that if we could do a showdown lookup of a particular IP address. That's going to send back a bunch of different types of information, will at least get port number is associated with that IP address. But it's also possible that we might get the exact information that we want, like the product name and the version number. If we don't get that product name for each of the records, we're going to parse the banner that comes from that IP address as well, and from that particular port and IP combination as well. Because that banner is the same thing you'd get from performing a port scan. We're going to call parse banner on it as well because it's essentially the same information. There we'll hope that there's something in there that we can extract a product name and version number from. Then if the default ports that we've gotten from subdomains don't work and there's nothing on showdown, we're going to just scan some common ports. We looked at syn scan in an earlier video, a particular IP, it's got a list of port numbers that it checks. If any of those ports are open, will return those and loop over them, and call this same function again, just like we did for those default ports. In the end, what we're hoping to end up with is something to parse. So either the server header from HTTP header graph or whatever the service spits out in response to a banner grabbing request. We'll do that parse down here and our last function. If we have an HTTP port, we've got one of two options. We either have the server header that we've gotten from our HTTP header grab, or we have the response to an HTTP request in showdown. If we've got the entire HTTP response, we need to parse out that server section so that we can analyze it properly. Easy way to identify that is the HTTP response is going to start with HTTP and then the HTTP version number. If our banner starts with HTTP, we know we need to parse. Otherwise, we just already know that what we've got is our server. If we need to parse, we can use Python's RE library for regular expressions. Its search function will let us find a particular substring within the HTTP response. In this response, we're going to have that server header somewhere, hopefully. What we can do is look for that server header name. It would be server colon space. Then there will be something, and that something or that header is always going to end with carriage returns. So /r/n. If we look for zero or more of anything except for /r/n, then what we'll get is whatever the value associated with that server header is. That value should be what we want to see. Then we're going to be searching through the banner value that we passed in. The result of re.search we can access will store it and match, and if we have a match, we can use match.groups to see the various results that we get. See here how we group this in parentheses. The zero if value in the groups will be whatever match this section in parentheses. So if we have say, server colon space, Apache 1.0.0.1, will match and receive Apache 1.0.0.1, or whatever it is. Then we store that in our server variable because that's the exact same variable we get from requesting that server header value using our HTTP header grab function. Then based off of that, we can split our server header information to determine what we want. The standard for HTTP specifies the format of how this will work. The server value will start with the program name, so something like Apache. Then there may or may not be a slash followed by a version number, and then after that there can be a space followed by additional information. But that first section is what we really want. We know that we have something followed by a space, so if we do server.split on spaces, we'll get the section we care about and then if there is anything else, it's in the second part of the result. If we grab the zeroth section, we get the section we care about. This should be product name and potentially slash version number. We then can split on that slash, which would give us the program name and then the version number if it exists. Our product name is vals of zero, and then our version number will be vals of one if the length of vals is greater than one. If that vals of one exists, otherwise it's going to be just an empty string. If we have a server name of GWS, because we're accessing one of Google sites, we're going to get GWS and product name and no version number. Versus if we have something that has a version number, we'll get the product name there and the version number and version, which is exactly what we want. Because if we have that and it's nicely broken out, we can start using that information to inform the next stage of our reconnaissance, looking for potential vulnerabilities we can exploit. This is the case where we have an HTTP service, which is our well-behaved and easy to use case. Our other potential case is that we have anything else really, because we don't always have that nice server header that tells us exactly where to look. But we can think about what the information we want looks like and how it's normally formatted, and we can make an educated guess on where we might find that. Often we'll have something like the product name, a delineating character or a delimiter, and then a version number. That product name, not super-helpful because we don't know what it looks like. However, our version number is something that typically has a standard format and hopefully it's a bit weird. You'll see something like 1.0, or 1.0.1, etc. What we can look for is something like something, delimiting character, and it's something that looks like a version number. That's what we're doing here with this re.search. What we're saying is we want pretty much anything alphanumeric. Capital A to Z, lowercase a to z, 0-9, one or more of that. This would match most program names, followed by a single delimiting character. Often you'll have a forward slash, a space or an underscore between the product name and the version number. Then we need something that looks like a version number, so we'll have one or more of 0-9, followed by one or more of a period, followed by one or more of 0-9. This would match 1.1. It would also match 1.11, it would also match 1.0.1, etc. With re.search, we're able to pull out anything that matches this particular description, and then again, if we've got a result, we're going to use the groups command as well. I mentioned before that the parentheses matter. We've got a set in parentheses here, and then we have other sets in parentheses here. In groups of zero, we're going to have this first one, which should be our product name if we've got a match. In the second one, we'll have the widest group of parentheses we have over here, which should be the entirety of our version number. We'll put that into version, in which case we've got what we want. But there's a chance that this doesn't work. That could happen if you've got a case where a version number isn't provided. There's one more thing we can try. That's the fact that often the product names look or include information about what they are. For example, if you have an FTP server, there's a good chance that the word FTP is in the product name so that you know what to look for. What we can do is we can look for something that contains one of those hint words like SMTP or FTP. We'll allow alphanumeric characters before, an alphanumeric characters after, but we won't require them. We'll put the star saying 0 or more. Then notice this is all in parentheses, etc. I'm also going to use the dot lower command because it makes it easier to match on this. Because sometimes on different servers we use different formats for this. For example, vsFTPd has the FTP in capitals so that you know, this is an FTP server and the rest is lowercase. I could try to match every single variation of capitalization. Or we can just force it all the lowercase, so it'll match more easily. If we get any matches here, we're going to iterate over them. I'm just throwing out the potential that we have ESMTP because that's just saying it's an extended SMTP protocol. That's not the service name. That matters because if we've got a Google SMTP server, then EMSTP will be our first result followed by the GSMTP, that's our real one. This horse banner function is designed to plot the product and the version, it's not perfect. If you want something closer to perfection, look up and maps list of service fingerprints, it's huge. But they're able to ID a whole bunch of different services that way. This a lot simpler and it's something that we can write for ourselves that still gives us a reasonable degree of probability for finding what we're looking for. From end-to-end, what we're hoping to do is we take an IP address and potentially a list of subdomains. In the end, we want the product inversion running on each port on that system, hopefully. Let's see what happens. We will run python Service Lookup.py, hit Enter, give this a moment. Since we don't have any subdomains, we're not going to use the default ports. This happens to be a BMI setup, so Shodan doesn't apply either. Because Shodan has an index. However, we can use our scan common ports. We do get a few hits here, ports 21, 22, and 23 are active on this target system. Looking over here, we've got some information. For ports 21 and 22, we got exactly what we want. We have vsFTPd version 3.0.3. That we were able to extract from the actual banner that we're parsing. If I add a command, here to print our banner, we'll see what we're looking at here. Give this another run. It'll show us what it's reading through as it does this. We have two banners. We've got our 220 vsFTPd 3.0.3. We've correctly extracted the information. Then we have SSH-2.0-OpenSSH_ 7.6 p1, and then Ubuntu. This one mostly worked. The reason why is that OpenSSH uses a p there, so it doesn't quite match our desired version number syntax. However, knowing you've got OpenSSH version 7.6 gives us a starting point here. We just needed that p1 as well. We did pretty well. That we've gotten two of our three services, at least some information. The one we didn't get is 23 here. The reason for that is that the telnet protocol it's running there sends us a non-printable banner. When we talked in the previous video about scanning and banner grabbing, we talked about decoding it using UTF-8 codec. That doesn't decode. This version of the program doesn't accept it, and so we don't even try to parse it. We could use that fingerprint for fingerprinting using a strategy like M maps where it's got a list of fingerprints and matches them. But for our parsing, say, product name and version number doesn't work as well. This demonstrates another use of Python for reconnaissance. That we're going from the initial IP address using a lot of functionality we've demonstrated in some previous videos. We end up with knowing this is the product name and this is the version number on a target IP address. This is the information we could type into a CVE list or something similar and learn exactly what our potential attack vectors are. Thank you.