Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was waiting for something like that to happen! Next step - creating a human-language-free representation. I believe that once a group of llms can communicate only in embeddings tuned without any human text input, we're going to open a completely new chapter in AI.


This is actually something you probably want to avoid, if at all possible, because it makes it very hard to maintain insight into what the AIs are communicating among them. But that insight is crucial to stay informed about their progress in taking over the world, etc.


Yes! We should be extremely cautious about embracing approaches that make LLMs even more inscrutable. Having CoT, however unreliable it is, is nonetheless a huge boon for model evaluation that we should not give up so lightly.


Yeah, and it might not even gain us that much. It reminds me of how a zipped piece of JSON often comes close enough to bespoke binary serialization formats that it's not worth bothering with it.


How does a group help anything?

If you put 1000 dumb people together, they don't magically become smart?


If you put 1000 people who can't talk together they will create language so they can communicate. He's saying if we put LLMs together and don't force them to use English to communicate then they'll create their own language which may be superior for LLMs to English.

May be true but who knows.

I wonder if anyone has somehow tested the Sapir-Whorf hypothesis for LLMs somehow by training them on different languages and comparing task performance. I guess it's too difficult to get a large equivalent training set in different languages.


Is everything in LLMs translated back to English before interpretation?

It works fairly well in my native language, I’m surprised to learn that things get translated back.


LLMs have no fixed internal representation - they barely have internal anything - so no, there is no translation.

But there's also no guarantee any particular query generalizes (vs is memorized), so it might only be able to answer some queries in some languages.


Got it. And since my native language is arguably one of the closest to English (Dutch), it works very well. But probably not as well for, say, Asian which has completely different grammatical constructs.


It feels like an exercise in anthropomorphization to me.

Sapir-Whorf hypothesis is generally not considered to be reality. It makes intuitive sense but is wrong.

There are hours of podcasts with Chomsky talking about LLMs. The gist of which is that LLMS are extracting surface level statistical structure of language that will be good for routine coding and not much else. It is easy to infer that Chomsky would believe this idea to be utter nonsense.

I believe even the idea of getting a 1000 people together and we agree to label a rock "rock", a tree "tree", a bird "bird" is not even how human language works. Something that is completely counter intuitive.

Reading the paper, no one believes a hidden markov model is creating some kind of new thought process in the hidden state.

I certainly though could have no idea what I am talking about with all this and have pieced together parts that make no sense while this is a breakthrough path to AGI.


> There are hours of podcasts with Chomsky talking about LLMs

I'm not an expert, but it seems like Chomsky's views have pretty much been falsified at this point. He's been saying for a long time that neural networks are a dead end. But there hasn't been anything close to a working implementation of his theory of language, and meanwhile the learning approach has proven itself to be effective beyond any reasonable doubt. I've been interested in Chomsky for a long time but when I hear him say "there's nothing interesting to learn from artificial neural networks" it just sounds like a man that doesn't want to admit he's been wrong all this time. There is _nothing_ for a linguist to learn from an actually working artificial language model? How can that possibly be? There were two approaches - rule-based vs learning - and who came out on top is pretty damn obvious at this point.


What can you learn from something parroting data we already have?

Similarly, we are now finding that training on synthetic data is not helpful.

What would have happened if we invested 1/100 of what we spent on LLM on the rule based approach?


There is an old joke that AI researchers came up with several decades ago: "quality of results is inversely proportional to the number of linguists involved".

This has been tried repeatedly many times before, and so far there has been no indication of a breakthrough.

The fundamental problem is that we don't know the actual rules. We have some theories, but no coherent "unified theory of language" that actually works. Chomsky in particular is notorious for some very strongly held views that have been lacking supporting evidence for a while.

With LLMs, we're solving this problem by bruteforcing it, making the LLMs learn those universal structures by throwing a lot of data at a sufficiently large neural net.


> What can you learn from something parroting data we already have?

You can learn that a neural network with a simple learning algorithm can become proficient at language. This is counter to what people believed for many years. Those who worked on neural networks during that time were ridiculed. Now we have a working language software object based on learning, while the formal rules required to generate language are nowhere to be seen. This isn’t just a question of what will lead to AGI, it’s a question of understanding how the human brain likely works, which has always been the goal of people pioneering these approaches.


>Sapir-Whorf hypothesis is generally not considered to be reality. It makes intuitive sense but is wrong

Strong S-W (full determinism) might not be, but there's hardly a clear cut consensus on the general case.

And the whole "scientific field" is more like psychology, with people exchanging and shooting down ideas, and less like Math and Physics, so any consensus is equally likely to be a trend rather than reflecting some hard measurable understanding.

I'd say that the idea S-W is not to a degree reality is naive.


> Sapir-Whorf hypothesis is generally not considered to be reality.

This is true only in the strictest terms of the hypothesis, i.e. linguistic determinism. Language still encodes a lot of culture (& hence norms and values) in its grammar & diction—this isn't very controversial.

Granted, I don't think this is that related to the topic at hand. There's bias all over the decisions in how to train and what to train on; choice of language is just one facet of that.


Well maybe not 1000 people but to our knowledge, the human brain is actually made of physically independent zones that barely communicate together except with the zone that take all the outputs together and tries to do something coherent with all the garbage.

Idk if this could work with LLMs, especially because all the brain zones are somehow specialized into something while two LLMs are just identical machines. But we also know that the specialization isn’t that hardcoded : we know that people losing half their brain (after a stroke) can still relearn things that were managed in the "dead" part.

I don’t know, please correct my errors, I was just thinking aloud to say that multiple independent agents working together may be how "intelligence" already works in the biological world so why not for AIs ?


> the human brain is actually made of physically independent zones that barely communicate together except with the zone that take all the outputs together and tries to do something coherent with all the garbage.

That sounds like bullshit. Do you have a source?


Because group estimation is superior to individual estimations: The phenomenon is called wisdom of the crowds. When a group of people independently estimate something, individual errors tend to cancel each other out, leading to a surprisingly accurate collective result. This works because of:

Diversity of opinions: Different perspectives bring a range of estimates. Independence: Errors aren't systematically biased as long as individuals estimate without external influence. Error averaging: Overestimation and underestimations balance out when averaged. Law of large numbers: More participants increase accuracy by minimizing random errors. It was demonstrated by Francis Galton in 1906, where a crowd's average guess of a bull's weight was almost spot-on. (estimates must be independent and reasonably informed for this to work.)


> If you put 1000 dumb people together, they don't magically become smart?

1000 is probably too high, but groups of people are in fact more intelligent than individuals (though for humans it is likely because recognizing a correct answer is easier than finding it in the first place)


Functional groups which work well together, include open sharing of research and ideas, persistence of best output, are dedicated to realism, and are more focussed on problem solving than status display, will be smarter. The group works like a filter which generates multiple solutions and selects, remembers, and abstracts the best.

Dysfunctional groups which do the opposite will be catastrophically stupid.

There have been plenty of dysfunctional groups in history.


depends on the circumstances. lin-manuel miranda can probably write a better musical by himself than a team of 20 people with equal input would.

also, the bottlenecks that teamwork helps solve (eg the high cost of gaining expertise and low throughput of reasoning capacity) may not be that relevant in the ai age


> by himself than a team of 20 people with equal input would.

Sure, but the result would still be far better than the average of the output of the 20 individuals taken alone.

> also, the bottlenecks that teamwork helps solve (eg the high cost of gaining expertise and low throughput of reasoning capacity) may not be that relevant in the ai age

It's always tempting to anthropomorphize these systems and conclude that what works for us would work for them, but yes we don't really know if it would bring anything to AI.


I wonder if there's research on this, like if you took a group of individuals who scored the same on an IQ test, then got them to solve one together, how would the score improve?

Is there a way of selecting people to cover each other's intellectual blind spots?


Isn't that the very case behind the "wisdom of crowds" thing?


Looking at the current state of democracies around the world, my hopes are not on "wisdom of the crowds".


If you think the democracies are doing bad, you should see the autocracies!


You mean the thing democracies are turning into, thanks to social (crowd wisdom) media?


I don’t think social media really is crowd wisdom at all. It is built to pander to our worst impulses (I think, knowingly and openly, right? The algorithm selects for engagement, not learning and growing), and I’d be surprised if it isn’t producing a feedback loop as well (perhaps as an unintentional side effect). The wisdom of the crowds hypothesis relies on a random sampling, we’re intentionally applying a skew toward the angry and shallow.


No, he means the thing democracies had turned to, when hardly differentiating parties turned into a practocal "uniparty" in economic, corporate, and foreign policy, and ruled by pissing on what the people voted for, which the current populist backlash is a reaction against, as elites (and supporters) lament as "too much democracy" and scorn the ignorant plebes (case in point) and pine for censorship and "expert" rule.


That wasn’t what I meant and I don’t think you really thought it was.


Their current states were achieved by trusting technocrats and careeer politicians for far too long...


Not magically. Our great ancestors were pretty dumb, but they were getting smarter and better because of sharing their knowledge.


yes they got "smarter" by compiling a corpus of knowledge which future generations could train on.

sarcasm aside, throwing away the existing corpus in favor of creating a new one from scratch seems misguided.

this paper isn't about creating a new language, they are omitting the sampler that chooses a single token in favor of sending the entire end state back in to the model like a superposition of tokens. that's the breadth first search part, they don't collapse the choice down to a single token before continuing so it effectively operates on all of the possible tokens each step until it decides it's done.

it would be interesting to try this with similar models that had slightly different post training if you could devise a good way to choose the best answer or combine the outputs effectively or feed the output of a downstream model back in to the initial model, etc. but I'm not sure if there'd necessarily be any benefit to this over using a single specialized model.


they were not one bit dumber than you.


Average intelligence measures have risen substantially since early 1900s

https://en.wikipedia.org/wiki/Flynn_effect


> If you put 1000 dumb people together, they don't magically become smart?

Do they not become smart*er* though?


"Smarter" is too vague. A group can compensate for individual weaknesses or even converge on a hard-to-make prediction given sufficiently uncorrelated outputs; basically the idea behind ensemble models / wisdom of the crowds. But a group of 1000 dumb apes would never achieve categorically-above-ape intelligence, probably not even "genius" ape intelligence. Groups of unintelligent agents come with downsides as well, like the ant death spiral.


>But a group of 1000 dumb apes would never achieve categorically-above-ape intelligence

And yet, here we are.

A group of 1000 apes is large enough to have offspring and, given time, go through evolution.


they kinda do.. Its how City's work.

People learn by being around others being both successful and unsuccessful.


Wait what … how does democracy work then?


the benefit of democracy is primarily that it prevents governments from doing bad things, less so that it empowers more effective governance


It can do either, and can fail to do either. It’s the people having power that enables the outcomes, not the system itself. Democracy just grants the power to a broader set of people.


Democracy is not about being smart or dumb.

It's about everyboty having a say in decisions of government that affects them.

The failure of democracy as a system is not when people make dumb decisions (experts and high-IQ people have made some of the most stupid and catastrophic decisions in history), but when people's collective decisions are not being respected.


It doesn't.


That came out a few weeks ago from meta. Large Concept Models

https://ai.meta.com/research/publications/large-concept-mode...


How does one impart textual knowledge discovered by humans without language?


Couldn't we use an AI model trained on historical text data (up to today) to predict likely events for tomorrow? Taking this further, a sufficiently advanced AI system could potentially analyze human-generated text up to any given point in history to understand patterns of human thought and behavior, then project those patterns forward. This speaks to your point about human language - while we need text data for initial training, the AI's internal representations and predictions could potentially transcend human language constraints.


The training of the LLM itself would still use the human language. But you could add an extra channel that's never given any text or direct dataset training. Keep it purely a connection between hidden layers of different instances of LLM and train using the usual loss of perplexity or similar metric.

The interesting thing then would be - does it converge to similar embedding space as the input, or can LLMs create a more efficient "language".


I thought about it too (layman). When I learned about embeddings it almost immediately clicked as a sort of an ascended language, not sure why no one seems to talk about it. Exchanging embeddings must be so much “wider” communication channel than speaking real language. And in contrast to a language embeddings are (iiuc) continuous, i.e. you can rotate a vector continously and it will smoothly trace the changes between A and B. I can picture communicating in something like https://www.google.com/search?q=charlie+conspiracy+meme&udm=... - embedding difference vectors, but it’s all crystal clear and is a natural language for an llm, cause any vector combination points to a correct “inner screen” image/concept/younameit.

Or maybe this is my own ignorant confabulation, so nvm.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: