This video examines the tools and techniques for handling variable volumes of data. Google provides a communication service that's very useful for connecting systems and applications together when the timing and volume of data varies. The service is called Pub/Sub which is sort for publish-subscribe. The service will first be explained. Next, you'll see a Python example of how Pub/Sub is used in code, and at the end there will be a lab that simulates real-time streaming sensor data using Pub/Sub. Publish-subscribe or more commonly Pub/Sub, is a message bus that's a great way to deal with the challenge of ingesting variable volumes of data. Pub/Sub has the ability to ingest data at high speeds. It works well with streaming data. It's durable and fault tolerant, and because it does not operate on the data only stores and forwards it, the service is able to automatically scale with demand. As you'll soon see, the no ops design of Pub/Sub provides excellent benefits, but also has consequences that mean additional responsibilities for your application. Understanding the qualities of Pub/Sub will help you learn to use it in any communications method there are several circumstances that must be considered, a sender must find the receiver to deliver the message. The receiver must be available to receive the message and so forth. Pub/Sub handles many of these issues, freeing the communicators to focus on their work rather than focusing on communications protocol. Pub/Sub handles sender and receiver discovery. Imagine if a sender wants to communicate to a receiver without a service like Pub/Sub. Somehow, the sender would have to get the address of the receiver. Pub/Sub acts as an intermediary. The publisher publishes messages to objects known as topics, the subscriber subscribes to topics and receives messages. The single level of indirection represented by the topic connects the publisher and subscriber, while maintaining their independence from one another. The same feature of discoverability also handles changes. If a new subscriber joins, the publisher doesn't need to know about it to deliver messages to all the subscribers. If the publisher changes and a new publisher begins publishing messages to the topic, the subscribers don't need to change. This greatly simplifies communications. Pub/Sub also handles sender and receiver availability. Pub/Sub itself is highly available. If there are three publishers publishing into a topic and one goes away, it's not a problem, the topic will persist, and when the publisher returns, communications resumes. Likewise, if a subscriber is delayed by being overwhelmed with other work, or if the subscriber goes out of service it's not a problem, messages are durable. A message in Pub/Sub persist for seven days. It will be delivered when the subscriber returns. Pub/Sub enables development of systems that have available and durable qualities even if their components are not available or durable. Pub/Sub is low latency which means you can build high-performance and streaming systems using it. Pub/Sub is a messaging infrastructure, you don't use it to store data permanently, you don't use it to process, or to analyze data, there are other products for those purposes. Pub/Sub is very good at capturing data and at distributing data. Pub/Sub is a unified global service. It works everywhere the same. There are not different Pub/Subs per project, per domain, or per user, there's just one service. It's not a managed service, it doesn't run on a cluster of machines, it's serverless. A common question has to do with the relationship between Pub/Sub and the network. Doesn't the network already provide a method for applications to communicate? Well, that's true, but the qualities of the basic network communication are different from the qualities that Pub/Sub adds on top of the network. A good analogy of the difference between the network and Pub/Sub, is the difference between telephone communications and email communications. The telephone is synchronous. It requires the recipient of the call to be present at the moment the communication is made and if not there's no phone call, both the sender and receiver have to be online and available at the same time or the message isn't transmitted. Email on the other hand is a store and forward system like Pub/Sub, senders and receivers don't have to be online at the same time. Pub/Sub safely holds the data until it can be delivered. Pub/Sub helps smooth out traffic spikes or bursty communications. It delivers messages quickly and it autoscales to deal with variable volumes of data. Coordinating parts of a system using Pub/Sub simplifies the distribution of events. If the parts of a system must communicate directly with one another, there's a lot of complexity involved there would be many point-to-point links. However, with Pub/Sub, each part only has to communicate with a single highly available service and that simplifies the interactions. Pub/Sub provides asynchronous communications, the publisher never waits, it just keeps publishing to Pub/Sub. If you have two subscribers and one is fast and one is slow, there's no problem with Pub/Sub. The fast subscriber will consume messages as soon as they're posted, the slow subscriber gets the same messages but consumes them at its own pace. Even if the subscriber goes offline or becomes unavailable in some other way, the message will still be delivered for up to seven days. Pub/Sub can help you avoid over provisioning resources in your application. Consider for example, if you had a situation where they're normally about 100 machines needed. Every day for about an hour traffic spikes to handle the spike requires 1,000 machines. Using Pub/Sub to store incoming request and deliver them when resources are available, you can smooth out the traffic spike and reduce the resources required by your application. Now, you understand the qualities of Pub/Sub. Next up is how Pub/Sub works. Pub/Sub works through topics and subscriptions. It's an asynchronous communications pattern. There can be multiple topics in Pub/Sub, one or more publishers sends a message to a topic. If a subscriber is interested in those messages it can create a subscription in the topic. The subscription is owned by the subscriber. If a specific subscriber such as subscriber three goes away, the subscription is held for seven days to wait for subscriber three to return. It's possible to have multiple subscribers share the work of processing the messages in a single subscription. For example, for subscription one there are subscribers one and two, every message in the topic appears in the subscription and together the subscribers will process the messages in that subscription. This is the method by which fast subscribers and slow subscribers can be mixed. For example, subscriber one might be fast relative to subscriber two and neither of them has to wait on the other. The decoupling of publishers and subscribers ensures that neither side needs to worry about the availability of the other. Messages are guaranteed to be delivered at least once to all subscribers. However, there are a couple of behaviors you need to know about. It's not likely, but it is possible that a subscriber will get a duplicate delivery. What happens is that a subscriber acknowledges each message for every subscription. If the subscriber takes too long to send the acknowledgement, the deadline has passed and the message is resent. So, if the acknowledgment somehow was dropped or delayed, the subscriber might get two copies of the message. Also, the subscriber can extend the deadline per message to reduce the chances of this happening and also order is not guaranteed. For example, if you have messages of different sizes you might get smaller messages before larger messages rather than the order in which they were published. You need to consider how your application will deal with out-of-order messaging and possible duplication. Fortunately, the services work together and compliment each other. Pub/Sub delivers messages at least once but possibly more than once. Dataflow has features that provide exactly one time processing of a message, so it can handle duplication. Pub/Sub can deliver messages delayed and out-of-order. Dataflow implements a windowing system that can handle late arriving data and out-of-order messages. Another way to think about this is that Pub/Sub is asynchronous and stateless and dataflow has the ability to make the solution stateful. Pub/Sub is a global service. There is a single shared namespace for topics around the world. When a publisher publishes a message, the message is stored in the region closest to the publisher. For reliability and high availability, the message is stored in multiple zones. A subscription collects topics from different regions for example, publishers to topic A publish messages. Each message is stored in the region that's closest to the publisher. The subscription to topic A collects topic A messages from all regions and provides them to its subscribers. Pub/Sub is fast. Latency is in the range of hundreds of milliseconds. It's also scalable providing consistent performance from one kilobyte per second to 100 gigabytes per second. Pub/Sub supports multiple subscribers listening to a single topic called fan-in and it also supports multiple publishers publishing to a single topic called fan-out. Pub/Sub also offers both push and pull to subscribers. In other words, a subscriber can keep checking whether a new message is available, which is pull, or it can register for notifications when there is a new message which is called push. There are also two kinds of client libraries available. For languages such as Java, Python, C#, Ruby, and Node.js there are hand built libraries. However, Pub/Sub also supports gRPC remote procedure call which is an open-source RPC framework and libraries are automatically generated for gRPC languages. Sometimes people look at this diagram and are confused about why Pub/Sub doesn't appear. Just to review, Pub/Sub is not a database or a data storage service, it's a communication method. It's used for data ingest and streaming of data to a storage destinations such as a data warehouse, cloud storage, or BigQuery. Given the low latency characteristics of Pub/Sub, you might consider using it with a fast database for mobile applications. While this is one option, Firebase maybe a better option if your application involves real-time person-to-person communications such as for gaming, chat, and streaming of activities.