It is invaluable to have a chunk of human knowledge that can tell you things like the Brooklyn Nets won the 1986 Cricket World Cup by scoring 46 yards in only 3 frames
The facts LLMs learned from training are fuzzy, unreliable, and quickly outdated. You actually want retrieval-augmented generation (RAG) where a model queries an external system for facts or to perform calculations and postprocesses the results to generate an answer for you.
Is there a name for the reverse? I'm interested in having a local LLM monitor an incoming, stateful data stream. Imagine chats. It should have the capability of tracking the current day, active participants, active topics, etc - and then use that stateful world view to associate metadata with incoming streams during indexing.
Then after all is indexed you can pursue RAG on a richer set of metadata. Though i've got no idea what that stateful world view is.
This is an interesting idea but I'm having trouble understanding what you're to achieve. Do you mean the LLM would simply continuously update it's context window with incoming data feeds realtime, and you use it as an interface? That's pretty akin to summarization task, yes? Or are you augmenting the streams with "metadata" you mentioned?
Yea, the state i mentioned i think would be managed by several entities. Ie time, current date, etc - all could be automated without involvement of the LLM of course. However as conversations come in, the LLM would also modify the state with context clues from the conversation.
Then, when future messages come in from alternate streams (say, browser history), they could (maybe, hah) be made more rich. More likely though i would expect it to be the opposite scenario, browser informs chat, etc.
I say this because in many cases i imagine my chat conversations in my household have a severe lack of context. We often jump to vocal communication, and then paste links, etc. In a perfect world i think i'd even take home camera audio transcripts and do the same.
Ie i don't want to _just_ index a browser log as "interested in Rust. Some library about BTree", etcetc - but additional sources of data could try to store what it is i am actively doing, and associate that to the browser log.
All of this of course is nothing i'd ever want to leave the house. My hope though is that it would lean into what LLMs do well. Without the expectation of actual LLM intelligence.
So perhaps you're suggesting we sort of "boil-down" an information source into a sort of base representation of meaning and intent, something similar to vector store, and relate the many inputs together in this space using the LLM as glue like one does manually creating links in a web of Zettelkasten for research. I think this is something that the field is rapidly moving towards in personal information management.
> Australia won the 1987 Cricket World Cup. The 1986 date is incorrect; there was no Cricket World Cup in 1986. The tournament took place in 1987, and Australia defeated England in the final to win their first title.
I'm no cricket fan, so someone will have to correct Wikipedia if that's wrong.
If you want to point out that LLMs hallucinate, you might want to speak plainly and just come out and say it, or at least give a real world example and not one where it didn't.
I ran 'who won the 1986 Cricket World Cup' against llama2-uncensored (the local model I have pre-downloaded) and hilarious got 5 different answers asking it 5 times:
>>> who won the 1986 Cricket World Cup
India
>>> who won the 1986 Cricket World Cup
Australia
>>> who won the 1986 Cricket World Cup
New Zealand
>>> who won the 1986 Cricket World Cup
West Indies
>>> who won the 1986 Cricket World Cup
England
Which proves GP's point about hallucinations, though none of those are
> Brooklyn Nets won the 1986 Cricket World Cup by scoring 46 yards in only 3 frames
LLM's hallucinations are insidous because they have the ring of truth around them. yards and frames aren't cricket terms, so we're off to the races with them.
You can ask LLMs the same question and they might sometimes get it wrong and other times get it right. Having different answers is no indication that none of them is correct.
Furthermore, even if an LLM always gives the same answer to a question, there’s no guarantee the answer is correct.
An LLM will always give the same output for the same input. It’s sorta like a random number generator that gives the same list of “random” numbers for the same seed. LLMs get a seed too.
You're not "interacting with a language model", you're running a program (llama.cpp) with a sampling algorithm which is not set to maximum factualness by default.
It's like how you have to set x264 to the anime tuning or the film tuning depending on what you run it on.