Despite still not really showing any utility these tech companies want so so so much for cryptocurrency to catch on.
It feels like the entirety of cryptocurrency, outside of being a thing people used to buy drugs, has been an example of Chesterton's Fence, with half of Silicon Valley in denial of this fact.
And this, ladies and gentlemen, is why SaaS investors don’t understand how to invest in deeptech/hardtech, despite current trends. Like this guy, they have no clue about the differences in business model, except they’re not founders so they don’t go through the pain and they mostly don’t learn.
Hats off to the author for making it through! What a start to the journey!
Except it's not because it's constantly ambiguous in computing.
E.g. Macs measure file sizes in powers of 10 and call them KB, MB, GB. Windows measures file sizes in powers of 2 and calls them KB, MB, GB instead of KiB, MiB, GiB. Advertised hard drives come in powers of 10.
Yeah :c I feel the same way. They’ve made a variant with more traditional poker deck look but the same rank/suits of the ever deck that I’m excited to try one day
Right. The alternative is not to send materials from Earth for processing in space, that would be stupid. We send finished stuff, which were manufactured on the ground. But you don’t mine finished widgets from asteroids. You mine ore that needs refining and processing before being used to manufacture things. This ore is orders of magnitude heavier than the finished products, never mind all that’s required to do anything useful with it.
RNNs have two huge issues:
- long context. Recurrence degrades the signal for the same reason that 'deep' nn architectures don't go much past 3-4 layers before you need residual connections and the like
- (this is the big one) training performance is terrible since you can't parallelize them across a sequence like you can with causal masked attn in transformers
On the huge benefit side though you get:
- guaranteed state size so perfect batch packing, perfect memory use, easy load/unload from a batch, O(1) of token gen so generally massive performance gains in inference.
- unlimited context (well, no need for a concept of a position embedding or similar system)
Taking the best of both worlds is definitely where it is at for the future. An architecture that can train parallelized, has a fixed state size so you can load/unload and patch batches perfectly, unlimited context (with perfect recall), etc etc. That is the real architecture to go for.