What is a Merkle Tree and How Does It Help Organize Data On The Bitcoin Blockchain?


Integral to the blockchain concept is another concept that dates back to 1979: the Merkle tree, named after computer scientist Ralph Merkle. Central to how the Bitcoin blockchain works, and also to Merkle trees, is the concept of hashing.

What is hashing?

hash tree merkle tree

Essentially, hashing works by sending data through a particular cryptographic algorithm and spitting out a string of random characters, called a “hash.” Bitcoin uses the SHA-256 hashing algorithm, which is a one-way function. This means that when you put the identical data into the algorithm, you will get the same hash output each time. However, you cannot reverse the process and get the original data from the hash. Hashing is a key aspect of how Bitcoin works, and also of Merkle trees.

The Tree Part and the Merkle Part

In computer science lingo, the term “tree” is often used to describe a data structure with different branches. Within Bitcoin, the tree is formed upside down, beginning with different transactions, which are like the “leaves” of the tree.

For example, imagine that we have 4 Bitcoin transactions: A, B, C, and D. The data about each one of these four transactions is passed through the SHA-256 algorithm two times. First the data is hashed; then, the hash of the data is hashed again, resulting in hA, hB, hC, and hD. Once all four transactions have been double-hashed, more hashing happens. hA and hB are combined and hashed yet again. hC and hD are also combined and hashed together. Now, instead of four “leaves” we have two: Hash(hA + hB) + Hash(hC + hD). These two leaves are then hashed together to form one hash that contains all of the data about transactions A, B, C, and D. This final hash is called the “Merkle root.”

When Bitcoin transactions get added to the blockchain, the most recent transactions are grouped together into a “block,” which contains a bunch of transaction data (this would be the A, B, C, D, etc. from the example). By using a Merkle tree to condense this data into a hashed form, a lot of data can be stored securely without taking up too much space. Within Bitcoin, there are often several hundred transactions within one block, and the Merkle root allows the network to validate that data by using the summary contained in a 32-byte hash.

The Blockchain

bitcoin blockchain

Bitcoin’s blockchain contains the entire history of every single bitcoin transaction that has ever happened, with each piece of data linked to the one before it through hashing, all the way back to the very first transaction ever. All Bitcoin “nodes” – machines running the Bitcoin Core software – keep a local copy of the entire blockchain, which is constantly being updated as new transactions are added. Each new transaction is hashed multiple times into the Merkle tree structure of an individual block. To validate a new block, nodes will check the hash contained in the block’s header (the first chunk of code in a block) to see if it contains the reference to the previous block. If the hashes match up, then the new block will be added to the blockchain. Each block contains a link to the block before it, linking all the way back to the first bitcoin transaction, contained in the “genesis block.”

The use of cryptographic hashing is a core aspect of how Bitcoin works behind the scenes. Because hashing algorithms are incredibly difficult to “hack,” even for emerging quantum computers, Bitcoin’s blockchain is extremely secure. Since all the blocks are tied together through hashing, a malicious actor would have to alter the data not only in one individual block, but in all of the blocks on the blockchain, each of which is comprised of its very own Merkle tree of hashes within hashes. With current technology, that would be effectively impossible.

As of today, even though many Bitcoins have been stolen from wallets, exchanges, and other applications, the blockchain itself has remained impenetrable.