0:11

In this lecture, we are going to discuss another classical algorithm for

Â distributed mutual exclusion, and

Â we also wrap up our discussion of the mutual exclusion topic.

Â So the Ricart-Agrawala Algorithm that you've seen already from mutual

Â exclusion requires replies from all of the processes in the group,

Â which is one of the reasons that it has a high bandwidth.

Â It has a bandwidth that is order N.

Â The key idea in the Maekawa's Algorithm, new algorithm,

Â is that you need to get replies from only some processes in the group,

Â not all the processes but only some.

Â But you also need to ensure that only one process is given access to the critical

Â section at any point of time which means that safety is guaranteed.

Â 0:45

So Maekawa's Algorithm works as follows,

Â each process Pi is associated with a voting set called Vi.

Â A voting set consists of a few other processes from the group.

Â Each process must belong to its own voting set Vi.

Â Now, the voting set is not all the other process in the group which is

Â essentially the approach of the Ricart-Agrawala Algorithm, but

Â the voting set is only a small subset of the group.

Â However, you need to make sure that given any two voting sets Vi and

Â Vj of process Pi and Pj, respectively.

Â The intersection of these two voting sets is non-empty.

Â Which means that there's at least one process that belongs to

Â both Vi as well as Vj.

Â So given any pair of voting sets, Vi and Vj, you need to make sure that at least

Â one process is in common between those two voting sets.

Â This might sound familiar to you, in fact, it is the same concept of the same idea

Â as Quorums, which you have seen elsewhere in the course.

Â And however,here, the usage of this is slightly different of course.

Â 2:46

So let's see an example.

Â So suppose I have four processes in the system, of course,

Â this is a perfect square.

Â So we can put them in a matrix as shown here, p1, p2, p3, p4.

Â And if you select a processes drawing column as its voting set,

Â then you get voting sets as shown on the left side of the slide.

Â So consider p1 and the process p1, it's row consist of the process p1 and p2 and

Â its column consists of the process p1 and p3.

Â So it's voting set consists of those three process, p1, p2 and

Â p3, as shown by this circle over here.

Â Similarly, V2 which is the voting set for p2 consists of the processes p1,

Â p2 and p4, and so on and so forth.

Â You notice that any pair of processes intersect in at least one,

Â in many cases, two of the process in the system.

Â So [INAUDIBLE] V1 and V4, they intersect in two process, p2 and p3,

Â in other systems.

Â 4:08

Okay, so these are two key differences.

Â Anyway, let's look at how the algorithm actually works.

Â So first, initially the state of a given process.

Â Again, we described the algorithm by describing what happens at a given

Â process.

Â The state of a given process is released which means that currently does not

Â holding the lock or it's not in the particular section.

Â Also it's state for voted is false, meaning that it has not yet

Â voted for any other process to get access to the critical section.

Â Now, when this process Pi wants to enter the critical section it first sets its

Â state to be wanted, meaning it wants to enter the critical section.

Â Then it multicast a request message to all the processes in its own voting set Vi.

Â Notice that this votings in Vi also includes itself.

Â So, when Pi sends and receives its own request, then you have to process it.

Â And I'll describe how this processing is done in the next slide.

Â Then the process, after sending out this request message, it waits for

Â a reply or a vote message from all the processes in its own voting set.

Â And that includes a vote from itself.

Â When it has all these votes, it can then set a state to be held and

Â then proceed in to the critical section.

Â 5:58

Now, when a process Pi receives a release message from process Pj,

Â recall from the previous slide that a release message is received

Â when Pj has just exited the critical section.

Â And informs all its voting set members that it has exited the critical section.

Â Pi looks at its queue of waiting requests.

Â Remember, that Pi may be in not just Pj's voting set, but

Â also in the voting set of other processes.

Â And these other processes may be waiting for a vote from Pi.

Â But if Pi's queue is empty, it means that Pi doesn't have any outstanding requests

Â and so it sets its voted variable to be false and it exits in the point of time.

Â However, if there's something in the queue,

Â then a dequeue is a head of the queue, say a process, Pk.

Â Remember, that this means that Pk's voting set contains Pi and

Â that Pk has previously sent a request to Pi which has been queued.

Â And the queueing happened here in the past or

Â right at the top of the slide over here.

Â That's where the queueing happened.

Â And in this case, the entry for Pk is dequeued and a reply message is

Â sent to Pk, meaning that Pi now has voted for Pk to enter the critical situation.

Â And so Pi sets its voted variable to be true.

Â So why does this algorithm guarantee safety?

Â Well, when a process Pi receives replies from all its voting set members Vi

Â including itself, no other process Pj could have received replies from

Â all its voting set members Vj.

Â Because Vj would've had intersection with Vi in at least one process Pk.

Â And Pk could have sent only one reply or vote at a time.

Â And it always send a vote to Pi, which means that it could not have voted for

Â Pj as well.

Â And so this means that safety is guaranteed by the Maekawa Algorithm.

Â For liveness, a process needs to wait for

Â at most N-1 other process to finish the critical section.

Â And you might think this guarantees liveness.

Â However, Maekawa's Algorithm has a subtle behavior in its original form

Â that can actually violate liveness because it can result in a deadlock.

Â Here is an example of a deadlock.

Â Again, another example of four process P1, P2, P3, P4.

Â Suppose all the four processes request access to the critical section.

Â P1 gets some replies but is waiting for P3's reply.

Â P3, in turn, is waiting for P4's reply, which is in its voting set.

Â P4 is waiting for P2's reply and P2, in turn, is waiting for P1's reply.

Â Now, we've complete a cycle or a circle among these processes, and

Â this is called a deadlock.

Â There is not going to be any more progress in the system,

Â because these processes are going to be waiting for these replies forever.

Â So the original Ricart-Agrawala Algorithm as we've discussed is deadlock prone and

Â these deadlocks can occur.

Â There are of course variance of Maekawa's Algorithms that have been published

Â that address this issue and that are free from deadlocks.

Â Returning to original Maekawa's Algorithm, let's analyze its performance.

Â The bandwidth involved in entering the critical section is square root of

Â N messages that you send out to your voting set members.

Â And then square root of N replies so you see back, so

Â that's 2 times the square root of N messages.

Â For exit operation you're simply sending a reply message to all your votings and

Â members, that's just square root of N messages.

Â These numbers are better than Ricart and

Â Agrawala's Algorithms which had bandwidths of order N.

Â You might think square root of N is still a large number.

Â It's not constant for sure, but it can be fairly small.

Â So if N is a million for instance,

Â then the square root of N value is about 1,000, which is a fairly small number.

Â 10:14

So why is the square root of N the right size for

Â the voting sets in Maekawa's Algorithm?

Â Once again,

Â each voting set is of size K, each process belongs to M other voting sets.

Â Now, the total number of voting set members is K times N,

Â because each of the N processes has K members in its voting set.

Â And so, K times N is the total number of voting set members.

Â Of course because the voting sets overlap and

Â intersect members or processes maybe repeated in this K times N.

Â But each process belongs to exactly M voting sets, so K times N divided by M

Â should be equal to N, because each process appears exactly M times in this K times N.

Â So K*N/M = N.

Â And if you evaluate this by canceling N out on both sides you get that K = M.

Â Which means that the sides of the voting set should equal the number

Â of voting sets that each process is in.

Â Okay, so let's have that equation and hold it to the side.

Â Now, consider a process Pi.

Â Right, so the number of voting sets that are there in the system

Â is equal to a Pi's own voting sets so that's the +1 over here.

Â There are K members in Pi's voting set, right, that K processes.

Â And each of those processes has their own voting sets.

Â Right, so that's the total number of voting sets that is there in the system.

Â 11:36

Because each of this is a K voting set member of Pi's is involved

Â in M total voting sets.

Â But one of them is Pi's own voting set.

Â So the remaining voting sets are simply (M-1)*K.

Â Okay, so that's the total number of voting sets present in the system.

Â However, the number of voting sets should equal the number of processes in

Â the system, so we have N equaling this value.

Â Now, we substitute in the value of M equals K from the earlier equation, and

Â you get N = (K-1) *K + 1.

Â Solving this you will see that K equals about a square root of N give or

Â take a little bit.

Â And this explains why K equals M equals square root of N is the optimal value for

Â minimizing the overhead of K.

Â So in order to minimize K, we set N = (M-1)*K + 1.

Â And then we use the earlier equation, K = M, to give us a K = square root of N.

Â Now, the Maekawa's Algorithm of course does not handle failures.

Â Neither does the ring-based or the centralized algorithm or

Â the Ricart-Agrawala Algorithm.

Â Of course, there are fault-tolerant versions of all of these algorithms that

Â exist out there in the literature.

Â 13:14

And that's what we're going to discuss in very brief detail in this

Â slide and the next.

Â Chubby provides only Advisory locks, this means that clients must ensure that

Â the access locks before they access the critical resource, or the object itself.

Â So if clients forget to access the log,

Â then hey, then mutual exclusion may be violated.

Â So this is why this is known as Advisory locks.

Â So Chubby, of course, uses a cluster of servers, and

Â there are five servers by default in the Chubby cell.

Â One of the servers is marked as Master.

Â Again, this is elected using one of the Leader Election Protocol that we

Â send you before.

Â In this particular figure, Server D is a Master.

Â Chubby allows clients to not only do locking,

Â but also writes small configuration files.

Â And Chubby relies on the Paxos consensus algorithm.

Â Chubby again, is a group of servers, as one elected as the Master,

Â the other servers in the group, the non-master servers are just slaves.

Â They replicate the same information as the Master.

Â When a client wants to read one of these small configuration files

Â on the lock file, it sends a read request to the Master which can serve it locally.

Â Okay, so the Master serves all the request locally.

Â When a client wants to write, however, it sends a write request to the Master,

Â which then forwards it to all these servers in the group.

Â It then gets a majority of them to respond back to the Master.

Â When a majority of them respond back, you reach a quorum in the system.

Â Then the Master can send back a response to the client acknowledging that the write

Â has been done.

Â And so, this is where mutual exclusion comes into play.

Â If some other client has already got an access to the critical section,

Â it means that it has access,

Â it has permissions from at least a quorum of the servers via the Master.

Â And the next request that goes to the Master will not reach a quorum, and

Â it will wait until that first process exits its critical section.

Â 15:27

Now being upon discussion mutual inclusion,

Â it is a very important problem in cloud computing systems.

Â There are a variety of classical algorithms.

Â We've discussed four.

Â The Central, the Ring-based, the Ricart-Agrawala Algorithm and

Â the Maekawa Algorithm.

Â They are not all fault-tolerant but

Â they are fairly efficient and they all try to guarantee safety and liveness.

Â And as we progress down from Central to Maekawa,

Â the algorithms have gotten better and better in terms of performance.

Â Industry uses a variety of systems for coordination and for

Â mutual exclusion, these include Chubby at Google which is used for

Â locking and also to maintain small configuration files.

Â And also Apache Zookeeper which is an open-sourced system that is used for

Â coordination, and I encourage you to look up Apache Zookeeper on the web.

Â [MUSIC]

Â