Apache Storm. Apache Storm is used for real-time streaming at levels that can go down to as lowest microseconds. So, therefore, microseconds and above base processing is possible through Storm. You can find applications, a lot of applications that could be sufficient reanalyze just using Spark Streaming. Spark Streaming, once again, uses a mini-batch that can be as small as 0.5 seconds, half a second, or larger. A lot of applications can be supported by that. However, some applications like the stock market and other things controlling on autonomous systems and things like that will need faster services than half a second. That is where Spark will not be sufficient and Spark Streaming will not be fast enough. Therefore, the need for Storm kicks in for those applications. Therefore, we will study about Storm as Storm Trident. Storm Trident uses mini-batches as well just like Spark Streaming, and their characteristics are a little bit different because Spark is based on batches and mini-batches when we go into Spark streaming. Now, Storm is based on tuples, which can be very small. Then a group of tuples together to form a mini-batch is what is used in Storm Trident. So, in other words, you see that batch and mini-batch are used in Spark, while tuples and a group of tuples into a mini-batch can be formed using Storm and Storm Trident. We will get into the details of this. Looking into Apache Storm, Apache Storm is a distributed fault-tolerant real-time data stream processing big data framework. It was originally developed by Backtype, which was acquired by Twitter. Why would Twitter acquire Storm? That reason will be soon revealed. But first, let's have some fun looking into some Storm terminology. First, over here, you see a magnificent view of a storm, a huge storm coming up. Think of this, a seawater. This is live big data, consistently changing, moving around, never the same. Then we call a nimbus. A nimbus is a dark cloud in Latin. We have a nimbus over there. Then we have a spout. Think of a spout coming down from the dark cloud, the nimbus. It is a tornado. It is sucking up whatever is going down below, and it hits the surface of the sea water. The sea water gets sucked up, and that becomes a spout that you see right there. This is a case where water, as well as fish come down from the sky when something like that would happen. In this picture, you can see from a nimbus, a dark cloud, we have bolts, bolts of lightning. Lightning have massive energy. They can change. They can create transformations. They can charge something up, as well as burn something down. So, bolts are the ones that create actual changes, transformations, and activity. Because of that, we will use this terminology in big data Storm technology. A nimbus, dark cloud in running, is the original core of the system. Then we have Zookeeper to do management of the cluster systems. Also, we have supervisors below. Now, Storm components are now what we will study. Nimbus, this distributes program codes throughout the cluster. It schedules topology processes, assigns tasks to worker nodes. It monitors worker nodes and monitors for node failures. Zookeeper, Apache Zookeeper provides highly reliable distributed coordination of clusters. High availability fault-tolerance provided through replicated master and agents and non-disruptive upgrades are supported through Zookeeper. Supervisor waits for task assignments from the master node. For each worker process, it executes a subset of a topology process. Topology process involves many worker processes distributed across multiple worker nodes in the cluster. Topology is an overall computation process represented in a graph form based on Spouts and Bolts connected together to form a Directed Acyclic Graph, a DAG, of operations. Storm topology DAG with single components are shown here, and you can see the Spout and the Bolt structures are simple. For multiple component units, you can see that the Spout and the Bolts have multiple units inside and therefore, parallel processing, as well as faster data processing of the units are provided together through these more complex units. A Spout is a source of the tuple stream. It is input to the Storm network technology. An example could be like a Twitter API. Tuple is a sequence of data values. It's a finite number of data values. We will represent it as something like this. An N-tuple is a sequence of N elements. A five-tuple example will be five numbers. A tuple stream is a continuous stream of tuples. A Bolt is a processing unit of the tuple stream. It receives input tuples streams, and it will process and emit a tuple stream output. However, if the bolt is at the end of the DAG, then in some cases, it may not emit an output. Process types include functions, filter, aggregate, join, talk to database, and there's so much more. For Storm, Storm programs are written in Java and Clojure. Initially, it was released in 2011. In 2011 September, the first major release was 0.5.0. Originally developed by Nathan Marz and the team at Backtype. Twitter acquired Storm and made it an open source Apache Storm project. In 2014 February, Apache Storms first major release was provided, which was 0.9.1. In the same year September, Apache Storm became a Apache Top-Level Project. Here, you get to wonder how a Twitter use Storm, and we will get into the details of that. Features of Storm include, it can process millions of tuples per second. That means that per tuple could go down as low as microsecond units or maybe even lower. Highly fault tolerant and fast recoveries are possible. Efficient and fast cluster control is supported. It's simple to program and supports multiple programming languages. It supports various cross-platform Operating Systems. The features of storm include a topology-based Spouts and Bolts used to form a DAG. A topology's DAG Edges are streams that direct data from one node to another. The topology forms data stream transformation pipelines. These are the references that I use, and I recommend them to you.