So... honestly asking here, but why isn't the blockchain length a problem in the...

zaphar · on May 25, 2016

Nothing about the merkle tree that blockchains are built on prevents you from truncating the chain in certain circumstances. The truncated piece can still be verified against an un-truncated chain for auditing purposes. So it would be possible for a smaller section of the network to keep the full chain while a larger segment of the network can keep truncated chains of a smaller size.

You can configure the allowable size as well so any particular node on the network can decide how much of the chain they prefer to keep around.

DennisP · on May 25, 2016

Ethereum's designed to allow very effective pruning of old transactions. Details here: https://blog.ethereum.org/2015/06/26/state-tree-pruning/

It's already implemented in one of the independent clients (written in Rust), and they plan to add it to the official client before the storage size gets too unwieldy.

In the long term they also have plans for sharding, so any given node doesn't even have to store the entire current state. That's more of a research problem but they seem to be making progress on it.

IkmoIkmo · on May 25, 2016

Well two main reasons.

One is because a lot of data is redundant, more on that later.

And two because storage becomes abundant. i.e. in 1995 1mb of daily storage would be a big deal, it'd cost a few thousand dollars to store a year's worth of data. Today 365mb of storage hardware costs about 1 penny. Today the blockchain adds at most 50gb of annual data, which costs about $1.5 of retail harddrive hardware per year, in a few years that'll drop down dramatically, too. So abundance of storage, if it outpaces blockchain growth, isn't that big of a problem.

The exact numbers are a bit hand-wavey, I agree, because a lot of this is new stuff and dependent on a lot of uncertainties. e.g. the growth of storage abundance is relatively known (although far from certain), but it must be compared to the rate of adoption and the rate of block size increases, both of which are relatively unknown, to know whether storage tech outpaces storage needs.

But data redundancy is the main argument, not abundance of storage.

i.e. if I have a list of transactions, where I send you $1 and you send me back $1 a trillion times, we could either store a trillion transactions in a trillion days, one per day, which would be a massive file... or we could store the last 10 years of transactions (just 3650 transactions) and have a for all intents and purposes 100% safe and accurate idea about our balances. Most of that data just isn't necessary to store.

At some point you can say, 1 year ago there was a certain balance on the blockchain, this balance is correct, hasn't been contested, and would need billions of dollars spent on energy to 'correct' by double spending the transaction and then mining 1 year of blocks while keeping up with new blocks, i.e. a virtual impossibility, so we can trust that 'snapshot' of the blockchain. Then we can throw away all the data before that, use that snapshot as a sort of 'balance', and then build a new chain from there. This means that for practical purposes, you can keep the blockchain limited to say a year's worth of data, or say X transactions back in time, if you wanted to.

So what'd happen is that a few parties, e.g. large businesses and universities, would store all data for posterity, research etc... while most participants would only store a small chunk of data.

It's more complex that that but that's the basic gist of the story.