Interesting, but Google has the worlds largest search index to use to build models, and billions of android phones and gmail accounts. An open source model may share the same algorithm but it’s possible training set will be dwarfed by Google. It even might have the same number of connections. The article is arguing that a few billion is enough, but what about 5 years from now and even for fewer connections data quality wouldn’t matter? Sure you can run a model slowly on a raspberry pi, but custom silicon can’t do more?
There’s a linked “data doesn't do what you think” document in that post, which might counter this argument but the site is now down.
There’s a linked “data doesn't do what you think” document in that post, which might counter this argument but the site is now down.