So we've been talking about the ability for participants to publish their transactions and get it into the blockchain, as if this happens by magic. Of course it doesn't happen by magic in the real world, it happens through the Bitcoin network. So what is a Bitcoin network? It's a peer-to-peer network, so it inherits a lot of ideas from peer-to=peer networks that have been proposed for all sorts of other purposes. It's a peer-to-peer network where all nodes are equal. There's no hierarchy, there's no centralized, special nodes, no master nodes. Every node on Bitcoin is an equal peer. It runs over TCP, it has a random topology, so there's random nodes are peered with random other nodes. And new nodes can come at any time. So you can download the Bitcoin client today, you can spin your computer up as an node, and you will be a participating Bitcoin node with equal rights and capabilities as every other node on the Bitcoin network. Now the network is very dynamic. It changes over time. Nodes are coming and going all the time. Although there's actually no explicit way to leave the network, instead if you don't hear from a node in awhile, three hours is the amount that's hardcoded into the common clients, people eventually start to forget you. So it gracefully handles nodes going offline. So what does that mean, that you can simply join the Bitcoin peer-to-peer network at any time? Well, if this a picture of the network at one moment in time, obviously scaled down quite a bit with just seven nodes, but this is a picture of what it might look like, seven nodes, with all random connections to each other. And notice that the numbers are scattered around here cuz there's no geographic topology here. Networks connect other nodes in a random fashion by design. Now if you launch a new node and say you wanna join the network, you start with a simple message to one node that you know about. So all you need to know is how to get to one node that's already on the network, this is usually called your seed node, and there's a few different ways you can look up lists of seed nodes to try connecting to. But you find your seed node, and you send a special message, saying, tell me all the peers that you have. Tell me the addresses of all the other nodes in the network that you know about. And that node will respond and say, well, I'm peered with nodes one and seven, you can try them. And then you might go and talk to one and seven and say, hey, tell me everybody on the network that you know about. And they'll send you the nodes that they know about, and you can iterate as many times as you want until you have a list of peers to make connections with. And then, you can choose which ones to peer with and you'll be a fully functioning member out of the Bitcoin network. And again, there's a few steps of randomness here, so, depending on which seed nodes you used or which of the peers of the seed node you decided to go and talk to, you'll end up with a random set of nodes that you're connected to. But that's perfectly fine. So now that you're a member of the network, what is the network good for? Well the network maintains the blockchain, so if you want to publish a transaction, you want to get the entire network to hear about it. And there's a simple flooding algorithm to make this happen. So let's say that node four here hears about new transaction, so Alice wants to pay Bob some money, Alice creates a Bitcoin transaction and submits it to node four. Or maybe her wallet software or her exchange does that on her behalf, but somehow this transaction gets to node four. Now node four says, great, I've got a new transaction, Alice wants to pay Bob. Let's tell everybody about it, sometimes this is called a gossip protocol because it's very simple. If you have news you try to tell as many people as you can, and they try to tell as many people as they can. Much like people gossiping in the real world. Great, so node four is gonna talk to its neighbors, node three and node two. And say hey, check out this new transaction. Alice wants to pay Bob. And those nodes will add it to their own pool of pending transactions, so each node maintains a list of all the transactions they've heard about that haven't been put into the blockchain yet. And then they can decide to forward that on to other nodes. So three is gonna talk to its neighbors, and say new transaction for you. Alice wants to pay Bob. That'll end up in their transaction pools, and so on. And we wanna make sure that this process doesn't go on forever. So let's say that node two comes along later and tries to tell node seven, hey, new transaction, Alice wants to pay Bob. Node seven is gonna say, that's all right, node two, I've already heard about that, already got in my memory. I don't need to forward it further. So eventually, this thing has to stop because every node will have heard about the new transaction and they won't forward it anymore. And remember every transaction is identified uniquely by its hash, so each node can tell they've seen that hash before and that they don't need to keep forwarding that transaction so it won't loop around the network forever. So how do nodes decide when they hear about a new transaction whether or not they should propagate it? The most important thing they do is they check to see, given their view of the blockchain, whether or not this transaction is valid. So they do all the transaction validation we talked about earlier. They run script, they see that the script checks out. They see that the coins that are being redeemed here haven't already been spent. And if all of that checks out, then this looks like a valid transaction that they should try to relay, with a couple of other caveats. By default, nodes won't relay the transaction if it's a nonstandard script. If the script has any weird features, if it doesn't match a fairly simple whitelist of scripts that nodes know about, even though it's a valid transaction, the nodes won't relay it. They'll also make sure that they haven't seen the transaction before. That's that condition to avoid infinite loops. And there's another property which is that they won't relay the transaction if it looks like a double spend. So if they've seen a transaction where Alice tries to send some specific coins to Bob, And then later they see a second transaction where Alice tries to send those same coins to Charlie, the nodes shouldn't relay the second transaction. Even though either transaction could be valid, because those coins still haven't been spent, they'll only relay the first one they hear. And that's an extra guard against double spending. But it's important to keep in mind that all of these checks are just sanity checks. So well behaving nodes all implement these to try to keep the network healthy and running properly, but there's no rule that says that nodes have to follow these specific steps. So since it's a peer-to-peer network, and anybody can join, there is always the possibility of a node not following this exact protocol. Forwarding double spends, forwarding transactions that aren't standard, forwarding transactions that aren't valid. And that's why it's important that every node do the checking for itself. So it's possible that nodes will end up with a different view of the pending transaction pool based on what they've seen. So let's go back to this example where node four originally relayed a transaction where Alice was trying to pay Bob, and let's say that this transaction hasn't yet flooded to the entire network. And before it gets to everybody, node one is going to announce a new transaction and say hey, I just heard Alice is trying to pay Charlie. Now from node one's perspective this is a valid transaction, and they haven't seen the other transaction where Alice is trying to pay Bob. So, node one is gonna implement the protocol normally and is gonna tell all of her neighbors about it. Now the neighbors that haven't heard the conflicting transactions yet will added to their transaction pool. Whereas other neighbors, like node six in this example, they've already received the transaction where Alice is trying to pay Bob. So, node six is gonna say, I don't wanna hold two conflicting transactions in my pool I'll just keep the one I already have. The network may end up in a divided state here, where different nodes have a different view of what the pending transaction pool is, but that's fine. These transactions haven't been published in the blockchain yet, so this is just a temporary state where nodes disagree on which transaction should be put into the next block. In practice, this is a race condition. If nodes have a different perspective on which transactions are pending, or which blocks have been accepted, that's okay in a temporary state. And eventually, they'll sort it out. So in the case of transactions, if different nodes have a different view of the pending transaction pool, depending on who mines the next block, they'll essentially break the tie with a race condition and decide which of those two pending transactions should end up being put permanently into a block. And once one of those two transactions has been put into a block, other nodes will see that the transaction that they're holding onto in their pool is now never going to make it into a block, because it would be a double spend, and they'll just drop it. So if the transaction where Alice tried to pay Bob successfully makes it into a block first, the nodes who heard the transaction where Alice tried to pay Charlie will just say that's not a valid transaction anymore so I can forget it. So the default behavior is for nodes to just hang onto whatever they hear first, which means that network position matters. If two conflicting transactions, or two conflicting blocks get announced at two different positions in the network, they'll both flood in opposite directions. And the nodes which end up with one transaction or the other will depend on which side of the network they started out closer to. Of course this assumes that every miner implements this logic where they keep whatever they hear first, but there's no central authority enforcing this. So every node is free to do whatever logic they want. So if for some reason if any node wants to, they can choose to implement any other logic they for choosing which blocks or which transactions to forward. We'll talk about that more in our lecture on mining, why miners might want to implement some different logic other than the default. Now I've been talking mostly about transactions here. The logic for announcing new blocks, whenever miners find a new block, is almost exactly the same as propagating a new transaction. So the same algorithm is used to announce new blocks around the network. It's the same flooding algorithm and the same gossip process. And in this case instead of verifying that the transaction is valid by running a script, the nodes are going to verify that the new block is valid by computing the hash. And making sure that it starts with a sufficient number of zeroes, to meet the difficulty target. Now validating a block is also much more in-depth because in addition to validating the header. And seeing that the hash value is correct, nodes are asked to validate every transaction included in the block to make sure that the block contains only valid new transactions. And the other check, which is this really important critical part, if it makes BitCcoin consensus what it is, is that nodes shouldn't forward a block, unless it builds on their perspective of the current longest chain. So they have a view of the blockchain, and they should only forward new blocks if they come at the very end of the chain, not at some earlier point. And this avoids forks building up. So just like with transactions, nodes can implement different logic if they want. They're free to relay blocks that aren't valid, or to relay blocks that build off of an earlier point in the blockchain. So some nodes may be trying to relay a block that doesn't extend the current longest chain, that actually builds a fork. And that's okay, the protocol is designed to withstand that. So how long does this floating algorithm actually take? How much latency is imposed here? This is a graph showing the average time for new blocks to propagate to every node in the network, and the three lines show the 25th, the 50th, and the 75th percentile of how long it takes for a new block to reach every node in the network. And if you look at the 75th percentile there for some of the larger blocks, and this is heavily dependent on size because of bandwidth constraints that some nodes have, you'll see that the average propagation time is over 30 seconds. So this shows that this isn't a particularly efficient protocol. On the Internet, 30 seconds is a pretty long time for people to hear about something. The reason it takes so long is because the protocol is not very efficient. It wasn't designed to be efficient. It was designed to be simple and to have no structure so that every node is equal and they can come and go at every time. And as a result the topology may not be optimized for fast communication. A block may need to go through many nodes before it reaches the most distant nodes in the network. Whereas if you design a network top down for efficiency you would design it to make sure that the path between any two nodes was very short. For Bitcoin, it's more important to have a decentralized structure where all nodes are equal, even if that means that the propagation time can be over 30 seconds in some cases. So how big is a Bitcoin network? Well, there is no official statistics anywhere, because again there is no central authority overseeing it. It's simply whatever the nodes participating. They are the Bitcoin network. So it's impossible to measure exactly and it's changing all the time. But a number of researchers have looked into this and tried to come up with estimates. On the high end, some researchers have said that over a million IP addresses in a given month will at some point be running the Bitcoin protocol and acting at least temporarily as a Bitcoin node. But if you look at full nodes that are actually permanently connected and are fully validating every transaction they hear, and running the full protocol, it's only about 5 or 10,000, which may be a surprisingly low number. And in fact, that number may be dropping. There's no evidence that the number of fully validating nodes is going up. And there's some concern that the number of fully-validating nodes is actually going down. So to be a fully-validating node, you wanna stay permanently connected so that you hear about all data. The longer you're offline, the more catch up you're gonna have to do to hear about all the transactions you missed. And you're gonna have to stored the entire block chain. You'll also need a pretty active network connection so that you can hear every new transaction and forward it to your peers. So you can see the growth over time here, and currently it takes about 20 GB to store the entire blockchain. Which isn't too bad, if you have a few years old PC with an active network connection you have what it takes to be a fully validating node. Although you basically need to dedicate that machine to doing that, and not much else. Fully validating nodes, maintain the entire set of unspent transaction outputs. So every coin that's available to be spent, and remember that those are just unspent output transactions. Ideally, you'd like to store this in RAM, so that when you hear a new proposed transaction on the network, you can quickly check the transaction that it's attempting to claim. Run the script and see if it the signature is valid. So currently there are about 12 million unspent transactions. And that's out of 44 million transactions that have ever been proposed. So fortunately, that's still small enough to fit in less than a gigabyte of RAM in an efficient data structure. So that if you're running a fully validating node, every time you hear about a new transaction, you can quickly check, run the redemption script, and see that this is a valid transaction that you should put in your pending transaction pool. So in contrast to being a fully validating node, there are lightweight nodes, also called thin clients or simple payment verification clients. This is actually the vast majority of nodes on the Bitcoin network, and the difference here is that these nodes aren't attempting to store the entire blockchain. They only store the pieces that they need to verify some specific transactions that they care about. So for example if you run a wallet, your wallet might want to be a simple payment verification node, and if somebody sends money to you, you'll act as a node. You'll download the bits of the blockchain that you need to verify that the person sending you the money actually owned it and that the transaction sending it to you actually gets included in the blockchain, but you won't care about the thousands of other transactions going on that don't affect you. Now an SPV client like this won't have the full security level of being a fully validating node. And reason is that when they hear a new block, the only thing they can check is the block header. They can check to see that the block was difficult to mine, but they can't check to see that every transaction included in that block is actually valid, because they don't the entire previous blockchain. They don't know the entire unspanned transaction output set. They can only validate the transactions that actually affect them, so they're essentially trusting the fully validating nodes to have validated all the other transactions that are out there. So this isn't a bad security tradeoff. You're assuming there are fully validating nodes out there that are doing the hard work, and that if Myers went through the trouble to mine this block, which is a really expensive process, they probably also did some validation to make sure that this block wouldn't be rejected. And the cost savings of being an SPV node are huge. It's about 1000 times smaller to just store block headers than to store all of the previous transactions, so instead of storing about 20 GB of data, you're down to about 20 MB. Which is something that almost anybody on a PC or even on a phone can store and act as a limited node in the Bitcoin network.