It's non-commercial only: "If you plan to use the Yi Series Models and Derivatives for Commercial Purposes, you should contact the Licensor in advance" and "Your use of the Yi Series Models must comply with the Laws and Regulations as well as applicable legal requirements of other countries/regions, and respect
social ethics and moral standards," which is all fine and good, but as defined in the terms, “Laws and Regulations” refers to the laws and administrative regulations of the mainland of the People's Republic of China.
I tend to refer to this type of license as NC/China and I don't even bother poking around with those. I'll wait for the next Mistral or other Apache 2.0 model to come out.
Note, for coding, DeepSeek Coder 33B is already stronger and released under a less restrictive license.
Worth noting they include synthetic data and training other models as derivatives:
> “Derivatives” refers to all modifications to Yi Series Models, work based on Yi
Series Models, or any other models created or initialized by transferring the
weights, parameters, activations, or output patterns of Yi Series Models to
other models to achieve similar performance, including but not limited to
methods that require using intermediate data representations or generating
synthetic data based on Yi Series Models to train other models.
This model does not seem fit at all for public use except maybe for some research. I suspect they have customers lined up or are currently prospecting/in talks, and this release is purely for marketing purposes.
I'm not sure how enforceable that would even be. Since the output of these models can't be copyrighted, you can't apply a license to them either.
The question of whether the weights can be copyrighted is separate, but I also honestly don't think so. I think all these model weight licenses are an emperor's new clothes scenario, where everyone is too afraid to call out the obvious.
> We predict that AI 2.0 will create a platform opportunity ten times larger than the mobile internet, rewriting all software and user interfaces.
I was a bit interested until I read this. If this claim is to be believed, then the company appears to be entirely out of touch with reality, and it makes claims like outperforming llama-70b seem even more incredulous.
Matt Levine once described this very well, although he did it when referring to Adam Neumann and WeWork. I’m paraphrasing, but there are a subset of people who are able to consistently and accurately to great success exploit a bubble in the VC world, by dint of exactly knowing what to say and of being able to get the meeting in the first place.
Lee being able to credibly sell bullshit like the quoted claim is a feature, not a bug. It’s somewhat irrelevant if he actually believes it.
At first I was angry about these grifters because they suck opportunities out of the market, leaving legitimate startups behind. But knowing what I know now, that VCs are also grifters (looking at a16z and its pump and dump crypto win) I have started rooting for the "bad guys". So may Lee attract all VC money in the world to promptly set it on fire like Adam did.
What do you find offence with? Hip speech and yawn-2.0-s aside, I find the claim entirely defendable, given, of course, a certain set of assumptions in this very new and uncertain space.
I'm entirely ready to believe AI will soon write all apps and UI. I don't see such rewrites being a x10 market opportunity, given there's already "an app for that" even when there shouldn't be.
Uncertainty. Given incomplete information it is common practice to make assumptions about things to give us some starting point from which to act. In this context a round integer value pretty much implies: They are estimating, and they might be completely off.
I guess you could say, "Instead of making assumptions with uncertainty, let's not assume anything", but that makes progress in new and uncertain areas really hard.
It's a common and bad practice because quantitative language belongs in the domain of risk, not uncertainty. Two very different things. Risk is the world of stationary laws and empirical data. Fluid markets, mortality stats, that kind of thing. The "10x AI productivity boost" is about as real as the "10x engineer", that's to say, the PR department made it up. It's just vibes, but it's a symptom of a culture that unduly reveres anything that sounds "mathy".
I think it's pedantic argument. Obviously he meant that his bet is the future is something and willing to work for it. No need to scratch your head. 10x is not maths 10x and no one thinks it is literally 10.0x.
His book AI Superpowers is frontloaded with a lot of "Ra Ra mighty China inevitably will win the AI competition" stuff. I just felt sad for him if AI nationalism is his genuine world view, and not just some party line he has to parrot to appease the commissars.
Let's not discount folks just because they are patriotic. A lot of people (including ones you probably do respect) fall into that pit - it's like religion, just something people also do. It doesn't change anything else they achieve.
That said, the license on the AI they produce is not open. Calling it open is dumb and for that a certain measure of disrespect is warranted.
Valued at $1B, Kai-Fu Lee’s LLM startup unveils open source model
…
The startup’s ability to commence model training quickly is no doubt an outcome of its smooth fundraising, which is critical to securing top-tier talent and AI processors. While declining to disclose how much 01.AI has raised, Lee said it’s valued at $1 billion after receiving financing from Sinovation Ventures, Alibaba Cloud and other undisclosed investors.
The thesis described the Sphinx speech to text system. It ran, if I remember correctly, on a few VAX minicomputers.
In the early part of my career (first 20 years) I concentrated on tackling very difficult problems wherever I worked, accepting some failures and some successes. Reading his thesis was important to me because it gave me a feeling for trying to solve hard problems in a low resource environment (although I did have a Lisp Machine, access to a Connection Machine and a Butterfly Machine - so not terribly resource poor!)
For what it’s worth: I am in my 70s, still waste deep in AI tech, and I still look for a few people to follow to get inspired.
so by downloading some open source data sets, owning a few nvida A100s , -> train, an open source model form scratch using these datasets, get some investors to invest = billion dollar valuation?
How can an open source company have a valuation of $1B?
How do open source businesses work at all? I sometimes read about companies like these, with huge valuations and then the next day I read about some open source developer asking for donations because their project, which powers half of the internet [1], is not making him enough to pay for food. What makes the difference?
Over the years, I wrote some tools for my own use, which are way better than their commercial counterparts. If I would spend a few weeks to polish one of them and open source it, it would probably gain widespread adoption. But how could I then build that into a business?
Very common way: Go Open Core, sell hosting and plugins and proprietary engines etc and if that's not enough do a coup and change license to prevent competition from providing same services.
Not all projects are "business-able" I think. A nice tool with an identifiable user base, a nice UI, etc. is easier to market than a NTP library even if the latter has more probability to be critical and to get installed in billions of devices.
You could spend sometime looking into the business models of Elasticsearch/Docker. Albeit not successful in some eyes as they are exactly profit printing, but they have indeed raised enough to be unicorn status.
In this case being open source is a nice feature, but not the main one the business is built around. If you have terabytes of data and thousands of GPUs to train on plus the talent to work on it you have a clear advantage over many other possible competitors, even if you give them the source. This is not nearly comparable to building yet another open source todo app or analytics dashboard in JS.
like any other startup. time, energy and passion. define problem/pain. talk loads w potential customers. ask them if they would pay for it. excel in customer success. find ways to scale and be found :)
Or... or... s/he could convince a VC that the startup would multiply the potential opportunity in the space the tools cover by 10x and get a really fat check for generating the right hype with the right crowd.
Painfully, years ago I wrote a book with Web 3.0 in the title. For me that meant semantic web and linked data.
I am enjoying reading this entire thread. For AI skeptics: I am not trying to talk you into anything you don’t believe, but just as an experiment, whenever you think about LLMs, appreciate how well they generalize past things they have been trained on. Peter Norvig and a friend of his recently write an article arguing that LLMs probably meet our traditional criteria for AGI.
EDIT: I am aware of the recent paper https://arxiv.org/abs/2311.00871 that argues that LLMs have limited generalization capabilities, but I don’t agree.
China and Japan also have numerical stock tickers, e.g. Sony's ticker 6758 instead of e.g. SONY.
I guess numeric IDs like that were a consequence of early tech making it hard to deal with more complex character sets ... maybe the association lives on in those cultures?
There's also deeper associations with numbers like 8 representing luck, 4 meaning death etc.
6 sounds like the character 溜 which technically means something like smooth flowing but is used to compliment someone's well-practiced skills.
"666" is often said when someone does something impressive and smoothly. Like if someone double-flips a pancake and it lands perfectly, that's the kind of situation you'd say "666" to compliment them.
It's non-commercial only: "If you plan to use the Yi Series Models and Derivatives for Commercial Purposes, you should contact the Licensor in advance" and "Your use of the Yi Series Models must comply with the Laws and Regulations as well as applicable legal requirements of other countries/regions, and respect social ethics and moral standards," which is all fine and good, but as defined in the terms, “Laws and Regulations” refers to the laws and administrative regulations of the mainland of the People's Republic of China.
I tend to refer to this type of license as NC/China and I don't even bother poking around with those. I'll wait for the next Mistral or other Apache 2.0 model to come out.
Note, for coding, DeepSeek Coder 33B is already stronger and released under a less restrictive license.