Now that you have learned about distributed programming using sockets, let's review a very important concept that applies to distributed programming and even accessing I/O devices and this is called Serialization and Deserialization. So as we've seen before, we now know how to have two JVMs communicate with each other. Say JVM A, JVM B. And we can establish sockets and they can communicate. And if you recall, we saw that we could have calls like getInputStream() and getOutputStream() to perform the communications. But input and output streams really work at the level of sequences of bytes. But as you know, individual JVMs may have multiple objects, A0, A1, A2. Similarly, JVM B may have its set of objects. And so what we really want to communicate are objects. And this means that we need to find some way of converting the objects to bytes when we want to send, let's say, an object from JVM A to B. Convert it to bytes and that's called serializing the object. And then when we receive the sequence of bytes, we need to reconstruct a copy of the object at JVM B and that's called deserialization. Now how can we do this? There are a few choices. One is we could just take a custom approach, to serialization and deserialization. And if you know that the objects are simple, you may have a convenient way of converting them to strings or a sequence of bytes and just use that. For example, if there was some particular combination of characters and integers you could just write those out as strings and convert them. But this can get complicated if you actually have a richer object structure which you typically do in Java programs. So suppose you have an object x. Well, it will have a set of fields f1, f2, f3 and so on. And let's say f3 points to object y. Now it's not just a matter of converting the values in the fields of x to bytes, we need to somehow find a way to include y. That can get complicated. And so, another way that people consider is to use XML. So XML has been designed, as many of you probably know, to be a data interchange standard. And there are ways of converting Java object graphs into XML representations, and there are a lot of helper functions to serialize and deserialize XML. But that can be very heavy weight in many cases because there is a lot of overhead in metadata when you try to use XML. But the approach that we're going to discuss is to use Java Serialization and Deserialization. And here what we have to do is a class, let's say, a class x needs to be declared as implementing a serializable interface. Now once you do that you have methods that will automatically convert object x into bytes and it's very smart about including y. In fact, it's very important for class y to also be serializable. Because if there was a field in class x that refers to objects of type y and y is not serializable, then you'll get an exception when trying to serialize an instance of class x. So this works well and it's actually fairly robust. For example, if field f2 also happened to refer to y, the serialization will make sure that only one copy of y is included. It's also smart enough to take care of cycles. So if a field in object y referred to object x, it will make sure that it doesn't get into an infinite loop and try to include multiple copies of x and y. So all of this is good. But sometimes there may be some fields that are not going to be relevant when you ship them from one JVM to another. So for example, maybe f1 in JVM A refers to some kind of collection. And that collection object can of course point to x but point to large numbers, maybe thousands of other objects. Now if you literally try to serialize x with a reference to the collection, serialize is going to attempt to go and traverse the entire collection and create a large sequence of bytes out of them. And that's overkill if all you want to do is communicate object x or the combination of an instance of object x and y. So what can we do over here? It turns out that there's a convenient facility called Transient, which is a way of annotating fields that should not be copied when you're serializing an object. So what will happen is if f1 is marked as transient, its field will not be copied when shipped from JVM A to JVM B which means the copy of the object that arrives at JVM B will by default have a null value in f1. You can then override the appropriate read object methods to put some appropriate value in f1 if null is not the appropriate value instead. So this turns out to be very, very convenient because you don't have to become an expert in parsing and unparsing to write your own custom serializer and deserializer. You also don't have to incur the overhead of XML every time. You can just declare the classes that you're interested in communicating across JVMs as implementing the serializable interface. And then when it's time to read and write objects, the runtime will take care of converting the object into bytes, so that it can go through the socket communication, and then recreating the bytes into objects, the deserialization approach. There is a related capability that is also catching on in popularity. And that's to use what's called an Interface Definition Language, commonly known as IDL. And a recent instance of this is the Protocol Buffer. So with using an IDL, there's some advantages and disadvantages. The main advantage is you can now communicate your objects, not just between Java processes but between processes written in other languages like C++ and Python. The disadvantage, or the little bit of extra overhead for the developer, is that you need to write the interface file. For example, for Protocol Buffer, you need to create something called a .proto file to essentially mirror your class and define the fields that you want to communicate. So, if you know for sure that you are only communicating between Java processes, then Java serialization is a very good way to get started. So there you see, we can go one step beyond the notion of just communicating bytes between two processes using sockets. By using serialization and deserialization, we can actually take the rich structure in our Java programs and get a copy of an object from JVM A to JVM B. And this really piggy backs on the kind of mechanisms you would use if you want to also read and write objects to file systems. And it just works very naturally with distributed programming.