In this lecture we will study the basic parameters used to express parallel computations on the JVM. There are many forms of parallelism in the wild. From GPUs and custom parallel hardware, over multiprocessors, and multicore systems to distributed computer clusters. Each parallel programming model is tailored to a specific use case, and has certain associated characteristics. In this course, we study a specific parallel programming environment. Still, we strive to be general. The ideas and algorithms that you will see in this course generalize to other models. We will specifically assume that we are running our parallel programs on multicore, or multiprocessor systems with memory that is shared across processors. Second, are programs run on the JVM run time which executes on top of an operating system of the host computer. What are the sources of parallelism in a operating system? Let's start by defining the most coarse grained unit of parallel computation. First of all we all heard about operating systems, but let's try to define them more formally. We will say that an operating system is a piece of software that manages hardware and software resources, and schedules program execution. Windows, Linux, and OSX are all examples of different operating systems. Most programs need an operating system to run on. A process is an instance of a computer program that is executing inside the operating system. We define a process like this since the same program can be started many times. Each time a process is started, and executes, the operating system assigns it some resources, such as execution time on a CPU, memory address space, file handles or net reports. Each process is assigned a unique process identifier PID once. For example when we start as we did a process with a unique PID is created to execute the problem for example, PIB 834. However, while is running we are not limited to a single instance. We can open another terminal and start another as with the process. From the parallel programming standpoint, a process is the most coarse-grained unit of coherency on a short term memory system. The operating system multiplexes many different processes on a finite number of CPUs. Each process executes for a small amount of time on a CPU before getting replaced by another process. This time interval is called a time slice, and the mechanism is called multitasking. Finally, it is important to note those that two processes cannot read or write each other's memory directly. Each process owns a separate isolated memory area. This property is important from the standpoint of security. Since one process cannot access another processes memory, it cannot steal its data or corrupt it. For us, this property means that processes cannot easily communicate, although operating system primitives such as allow two processes to exchange information, interprocess communication is usually not or straight forward. We therefore have another more fine grained programming primitive. Each process can contain multiple independent concurrency units called threads. There are two main benefits of using threads. First threads can be started programmatically within the program. Making it easier to structure parallel computations, than it was with processes. Second, and more importantly, threads share the same memory address space. This enables them to share information by reading from, and writing to, memory. Each thread has a program counter and a program stack. The program stack is a region of memory that contains a sequence of method invocations that are being currently executed. The program counter describes the position in the current method Threads that are executing on the JVM are specific in the sense that they cannot modify each others' program stacks. Stack entries, which correspond to lock on variables in the method, are only accessible to the thread that owns the stack. To communicate JVM threads must modify heap memory instead. When a JVM process is run it also starts several threads. The most important of which is the main thread. This thread executes the main method of a scala program. In a normal sequential program we only use the main thread to execute a program. However, in a parallel program, we must start several additional threads to parallelize our computation. The operating system then assigns our threads to available CPUs. Starting an additional thread proceeds in three steps. First, we need to declare a custom thread subclass. This subclass will define the code that the new thread must execute. Second, we must instantiate an object of that subclass, which will track the execution of the thread. Finally, we call the start method on the thread object to start the actual thread execution. Note here that the same clause can be used to create, and start many instances of that thread subclass. Let's study this on a concrete example. We define a thread called HelloThread, and override its run method. The code inside the run method will define what the thread instance executes. In this case, we just print the hello world message. We then create the new thread instance T, and call the method start to start the thread. To then wait for it's completion, we need to call the method join. On a figure, this looks as follows. Then the main thread pulse starts a new thread of the hello thread is started. The two threads then execute in parallel. When the main thread calls join it blocks its execution until the hello thread completes. After the hello completes an execution the main thread can continue executing. Let's see a demo. For the purposes of the demo, we will use sbt and the interactive Scala shell. To write down our snippet, we will enter the special paste mode which makes it easier to write multiple lines. To run the thread, we create a new threat instance t of type HelloThread, and then the call to start method. In the end, we call join to a for it's completion. Let's try a slightly different example. This time, we separate the printing of the hello world message into two statements. Hello, and world. We then define a method that runs two such threads in parallel. Let's run this method, main, several times, and see what happens. It seems that the first time we ran the main method, the two threads ran serially to each other, one after another. First, one fred printed hello world and the second thread did the same. Let's run it again. It seems that this time an alternative scenario happened. The first thread managed to print the hello part of the message, but then the second thread kicked in. It also printed hello, before the first thread managed to print out world, and then the both threads completed. The previous demo showed that the different, separate statements executing in two threads can overlap arbitrarily. This is expected. After all, the two threads execute in parallel. However, we sometimes want to insure that a sequence of statements executes at once as if they were just one statement. Here we want to make sure that two such sequences in two different threads can never overlap. Either t or s exit all of the statements first. In concurrent programming, we call this atomicity. An operation is atomic if it appears as if it occurred instantaneously from the point of view of other threads. Let's study in more detail why atomicity is important on an example of a method called getUniqueId. When invoked by some thread, this method must return a unique integer number. In other words, when a thread calls getUniqueId, the value it gets is not returned to any other thread. We implement this method by keeping a private uid count field, and then incrementing the field in the method before returning its value. Let's see a demo of how this works. We start by defining a private uid count variable. We then define the getUniqueId method, we then define a method which starts a thread that uses getUniqueId. This thread will call getUniqueId to get 10 getUniqueIds, and then print them to the standard output. We will start two such threads, and see what happens. In the set of numbers obtained in the two threads are not at all unique. In particular, the numbers 10 and 1 repeat in both threats. As we saw in the demo, the getUniqueId method does not execute atomically. Separate statements in its body can interleave arbitrarily when executing on different processors. As a consequence, invocations of get unique ID do not return unique values. For example, a threat could first read UID count. After that, it adds the value 1 to it. But before the actual assignment takes place, the second thread does the same thing. If both threads now execute the assignment, they will both write the value of one back into uidCount. As a result, when they read uidCount, they both return 1. To prevent such scenarios both the Java installer come with a synchronized construct which helps enforce atomicity. Code in the synchronized block that's invoked on the same object is never executed by two threads at the same time. The jbm insures this by storing something called, a monitor in each object. At most one thread can own a monitor at any particular time. For example, if a thread, T0, owns a monitor on an object x, another thread, T1, cannot acquire that monitor before T0 releases it. Let's look at how this works. The synchronized method must be invoked on an instance of some object. In our example, we instantiate a single object x of the type AnyRef. We declare a private uidCount variable as before, and define a method getUniqueID that increments uidCount and returns it. Note that this time, the code inside the method is surrounded by the synchronized block on object x. The method getUniqueId now works correctly. To verify this, start your own SBT instance, and repeat the steps that you did before. This time with the synchronize statement.