Bitcoin’s core technology is a very clever, and yet surprisingly simple, idea. It’s got distinct parallels with the git version control system; and the concept would work for a great deal more situations than just a virtual currency.
There are plenty of ill-informed articles by mainstream (and some tech) writers who haven’t really understood how Bitcoin works. This series of articles then is my attempt to add to the pile of not-ill-informed articles; hopefully it will turn up in Google searches to counter the erroneous information.
My goal here is to explain to a non-technical, but interested reader, how Bitcoin secures their money. Note: this is an in-depth explanation over a series of articles; it is not, by any means, required reading to be able to use bitcoins (for that, just download the client, or use an online wallet). My hope is that it serves as a demonstration of where the trustworthiness of Bitcoin comes, and why Bitcoin technology (even if the world ends up using some implementation other than Bitcoin) excites so many people.
There are two technologies that are important to appreciate, if not understand, to understand how bitcoin works:
I’ll cover signing in a future article; but hashing is so fundamental to Bitcoin that I’ll just say a few words about it here before we begin.
A cryptographic hash is a mathematical one-way function (what it is exactly doesn’t matter to us here) performed on an arbitrary number of input bytes to produce a fixed number of bits as output. That output is, for all intents, a random number, but – and this is the important part – is always the same output for the same input data, changes significantly for any single bit change in the input, and cannot be reversed to reproduce the input. For example:
$ echo -n "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | sha256sum d6ec6898de87ddac6e5b3611708a7aa1c2d298293349cc1a6c299a1db7149d38
The output here is a 256-bit number, which is 32 bytes, shown in hexadecimal. Let’s change just one bit of this input (‘
$ echo -n "@BCDEFGHIJKLMNOPQRSTUVWXYZ" | sha256sum dd67b1e9b6eedcf3f05e2272c7846cf7a8af7431abdaa1eca0afb2eff85f5197
The output is completely different. Remember we only changed a bit, and yet this number is different by a lot more than one bit. Now let’s add a single extra byte to the end of our input (an “A”):
$ echo -n "ABCDEFGHIJKLMNOPQRSTUVWXYZA" | sha256sum 6339a9e6741e7eb2bb6f9dac761706cfbcf9c3014a9e598823f90962fb4751b6
Completely different again. What about a single byte prefix (a ‘Z’)?
$ echo -n "ZABCDEFGHIJKLMNOPQRSTUVWXYZ" | sha256sum ffb0ad1156d94b08ca5904d8f6ef5305cdb6790a63a79e75e120788f86588fa0
Completely different again.
A hash then turns an arbitrary amount of input data into a fixed-length output that (pseudo) uniquely and unpredictably (without performing the hash operation) represents that input. I used “pseudo” in the description because mathematically there are many inputs that result in the same output (there must be, every byte in the input is unique) – but that’s an entirely theoretical concern. In practice, because the hash can be so huge a number, you would search for longer than the lifetime of the universe trying to find another input that produces any given output. For all practical purposes, every input has its own unique hash, and there is no other input that produces that same hash (simply by virtue of the fact that it is not possible to find it, even if it does exist theoretically).
A quick note about centralised money transfer: it’s incredibly simple. Persons A, B, C and D all sign up for an account on your centralised server/system. B sends money to the centralised service by credit card, bank transfer, or cash to create a credit. With an entry in a database, that credit can be transferred to A, C, or D. Credit can be converted back to money and given to A, C, or D by bank transfer or cash. Most services are run such that the credits are denominated in units of fiat currency, probably the same as what you deposited. Don’t kid yourself that they aren’t credits though – they are, and could be denominated in any arbitrary unit they felt like and can be stolen, frozen or blocked at the whim of the centralised service operator.
Bitcoin is not like this. It is decentralised. Decentralising anything is hard; decentralising a trusted money service is extra hard.
A node in the network runs a Bitcoin client and connects to N other nodes, ideally randomly dispersed through the network. N is arbitrary, and the client can choose for itself how many connections to maintain/allow (more being better, but more costly of their local resources).
As a brand new Bitcoin node, I connect to some peer nodes (how I find them is not relevant to Bitcoin technology, IRC is used at present if you care). My first job is to download the so-called Bitcoin block chain. The block chain is (unsurprisingly) a chain of blocks. Each block holds information about itself, and a ‘payload’. Think of it as like an email; the email has a header and a body, the header tells you about the email (to, from, subject), the body is the email. A block has a header which describes the block, and a payload (which is actually a list of transactions, but we’ll come back to them later).
A chain is formed because each block names the block it considers “previous” or “parent” to itself, by including a cryptographic hash of the contents of that parent block in itself. Of course that previous block includes the hash of the block it considers its parent in itself, and so on and so on. That means that the hash of the last block in the chain is dependent on the hash of every previous block in the chain, with that single (long) number, it is therefore possible to validate the entire chain. As a mathematical function, we might describe the hash of, say BLOCK5, like this:
blockhash( BLOCK5 + blockhash( BLOCK4 + blockhash( BLOCK3 + blockhash( BLOCK2 + ... etc ... ) ) ) )
If two parties agree that the hash of that final block has not been faked, then they agree that every other block in the chain has not been faked. It is this property that makes it impossible to edit any block in the chain without it being detected – to do so would invalidate every subsequent block.
[The assumption is that a cryptographic hash cannot be predetermined; it is impossible, without a brute force search, to find a block with a chosen hash. That means you can’t make up an arbitrary new block (say one that gives you a load of cash) that would replace an existing block, already in the chain. The hashes (at present) are 256 bits long (SHA-256 is used), that’s enough resolution (nearly) to count the number of atoms in the universe. You can see why brute force isn’t considered a viable option for faking a block.]
Back to the plot: the block chain ensures that if two parties agree on any particular block hash in the chain, they agree on all the blocks previous to that hash. There is one special case: the first block in the chain. In Bitcoin this is called the genesis block. This block has no parent (technically this state is encoded by having its parent hash set to the one invalid hash, all zeroes). This genesis block is hard-coded into every Bitcoin client, and so is a known point. Each client can therefore backtrack from any block they are told is on the chain by a peer node, to the genesis block, which they explicitly trust; hence verifying that the claimed chain really is a chain.
A block chain, as described, is the work of a moment. I could make arbitrary block content (“pay me lots of cash”), list the genesis block as its parent, then that block would become the basis for my next block; all appearing as part of a valid chain, with a trusted root. The solution that Bitcoin uses is to ensure that creating a block chain is not the work of a moment.
We’ll come to how that is done, in part II, when we discuss proof of work.