The OpenCL standard is divided into two parts, device side language and the host side language or API. Device side language is the portion of code containing activity that the program was to improve performance of by transferring it over to the more appropriate compute resource, which is GPUs, FPGAs, DSPs, CPUs. The host side will run the rest of the program on a more general CPU and wait for a response from the kernel side. Similar sensor ups in I/O. The OpenCL specification is divided into four parts called models. Four models or platform, execution, memory and program models. We'll talk about each of these models in more detail later. The platform models sets the stage for the roles of those important devices called accelerators or OpenCL device. The host is generally a CPU that has the role of directing which section of code can be transferred to one of the other part and devices to operate on. This important devices could be CPUs, GPUs, DSPs, or FPGAs. The most important devices have their own local memories. Here you are given an example platform. The host is a CPU, and there are two important devices or accelerators, a GPU and an FPGA. The data parallelism workflow in OpenCL as a matter of specifying the portion of data and which OpenCL device is in charge in working on that portion. The task parallelism workflow in OpenCL involves using OpenCL specific objects such as events, cues to organize the different task and kernels that the host requires an OpenCL device to perform. The task parallelism workflow in OpenCL involves using OpenCL specific objects such as events and cues in order to organize the different tasks and kernels that the host requires an OpenCL accelerator device to perform. Here we see an example of data parallelism in OpenCL. If you're given an array of n items, and you have n available OpenCL accelerated devices, you could have each OpenCL device work on an element in parallel with one another. Since the portion of code to the left actually run sequentially and not in parallel, we're interested in the parallel version on the right-hand side. Here we're given the example of task parallelism in OpenCL. If our program had two functions, that assuming we had multiple OpenCL accelerated devices, we can let each OpenCL device work on a function which we call kernel. Just as in our previous data parallelism example, the code on the left would run sequentially. So first foo would be worked on, then bar. Also looking at the parallel code to the right, in the case of the FPGA, we can afford to give a device multiple tasks, and if the device is big enough, it would still work on both at the same time.