Well, hello, we're going to talk today about streams and events and how they work in cuda. Up till now, this is what your stream programming looked like. You use the default stream. All kernels are executed by default on this stream and may be used page double memory because it was a little bit easier to use from a certain perspective. What this meant was that your CPU program copied data into pageable memory and then when you wanted to execute your kernel, you had to copy that into device memory. Well to do that, it went through pin memory and then on to GP global memory or into symbols, textures et cetera. The streams, once they got that starting call would continue but each of them has to execute in sequence and each of them goes through CPU memory that's pinned to get back to the program. So that's not necessarily efficient. It's fairy for being somewhat paralyzed within a kernel is very synchronized between kernels. Now, your CPU program can keep on going, it doesn't wait unless it's told to for the kernels to end. So in that respect, it is asynchronous. It's just not optimal. So now it's streamline it a little bit, sorry for the double antonya there. But now you want to create your own stream what is referred to here as a user stream. To do that, you need to use pin memory and copy directly between this pin memory and the GPU. Now kernel still execute sequentially on a single stream but you'll soon see that you can run multiple streams and then you get a lot more efficiency and a lot more control. Now with multiple streams, you can use a default stream without specifying a stream along with using your own define stream and let's just not show the memory copies coz it's redundant. Now you can have sort of two times the number of kernels running, you may get more efficiency, it's not going to be linear, it's more like when a kernel has a cash miss, there's nothing else for anything, it's threads to do, then it gives the other stream more power. So, note that these two streams operate as now without events completely separate from each other in asynchronous to each other. And actually at this point, even asynchronous to the CPU program to gain even more asynchronous capabilities in your streams. You can define them as non blocking, that means you can specify a number of kernels on the same stream. And while they're running, if there is nothing for a stream to do, it will move to another kernel. And then we'll make sure that the original and completed. There are limits to this, just like there's limits to the number of streams that can exist. I'm using the same GPU but yet again, you're getting a little bit faster, you're being a little bit more optimal. Now you have a little less control over how things execute. So that's where events come in. The purpose of events are to start and complete actions. So you say okay, stream A start which then tells the CPU to stop, wait for an end event from that stream or kernel. So then what you can do is then string these long to control not just when the CPU executes and when it starts and stops but kernels. You can have streams, have events that go between them that create dependencies and a stream and a kernel on that stream will not start until that's completed. If you want to have control over your streams and your CPU program, execution order and timing, you'll use events. Some major use cases are waiting for a live data to come in and batch, send it to a kernel. Or conversely use stream processing, not like this, which is stream programming but stream processing to send partial or small amounts of data continually to different kernels. You can control the way that different things execute, you can create workflows that are complex to say execute kernel 0, than 1, than 2 on these different streams and be efficient but sometimes fan out and bring it back together. It also helps you manage resources that are on multiple CPUs or GPUs and control when things are executed in both of those environments.