It is invaluable to have a chunk of human knowledge that can tell you things lik...

samus · on April 1, 2024

The facts LLMs learned from training are fuzzy, unreliable, and quickly outdated. You actually want retrieval-augmented generation (RAG) where a model queries an external system for facts or to perform calculations and postprocesses the results to generate an answer for you.

unshavedyak · on April 1, 2024

Is there a name for the reverse? I'm interested in having a local LLM monitor an incoming, stateful data stream. Imagine chats. It should have the capability of tracking the current day, active participants, active topics, etc - and then use that stateful world view to associate metadata with incoming streams during indexing.

Then after all is indexed you can pursue RAG on a richer set of metadata. Though i've got no idea what that stateful world view is.

firewolf34 · on April 2, 2024

This is an interesting idea but I'm having trouble understanding what you're to achieve. Do you mean the LLM would simply continuously update it's context window with incoming data feeds realtime, and you use it as an interface? That's pretty akin to summarization task, yes? Or are you augmenting the streams with "metadata" you mentioned?

unshavedyak · on April 2, 2024

Yea, the state i mentioned i think would be managed by several entities. Ie time, current date, etc - all could be automated without involvement of the LLM of course. However as conversations come in, the LLM would also modify the state with context clues from the conversation.

Then, when future messages come in from alternate streams (say, browser history), they could (maybe, hah) be made more rich. More likely though i would expect it to be the opposite scenario, browser informs chat, etc.

I say this because in many cases i imagine my chat conversations in my household have a severe lack of context. We often jump to vocal communication, and then paste links, etc. In a perfect world i think i'd even take home camera audio transcripts and do the same.

Ie i don't want to _just_ index a browser log as "interested in Rust. Some library about BTree", etcetc - but additional sources of data could try to store what it is i am actively doing, and associate that to the browser log.

All of this of course is nothing i'd ever want to leave the house. My hope though is that it would lean into what LLMs do well. Without the expectation of actual LLM intelligence.

firewolf34 · on April 10, 2024

So perhaps you're suggesting we sort of "boil-down" an information source into a sort of base representation of meaning and intent, something similar to vector store, and relate the many inputs together in this space using the LLM as glue like one does manually creating links in a web of Zettelkasten for research. I think this is something that the field is rapidly moving towards in personal information management.

dolmen · on April 2, 2024

LLMs don't learn facts.

They learn statistics on texts and are able to regurgitate them somewhat.

fragmede · on April 1, 2024

According to ChatGPT

> Australia won the 1987 Cricket World Cup. The 1986 date is incorrect; there was no Cricket World Cup in 1986. The tournament took place in 1987, and Australia defeated England in the final to win their first title.

https://chat.openai.com/share/e9360faa-1157-4806-80ea-563489...

I'm no cricket fan, so someone will have to correct Wikipedia if that's wrong.

If you want to point out that LLMs hallucinate, you might want to speak plainly and just come out and say it, or at least give a real world example and not one where it didn't.

vlunkr · on April 1, 2024

We’re not talking about running chatGPT locally though, are we?

fragmede · on April 1, 2024

sigh your going to make me open my laptop, aren't you.

fragmede · on April 1, 2024

I ran 'who won the 1986 Cricket World Cup' against llama2-uncensored (the local model I have pre-downloaded) and hilarious got 5 different answers asking it 5 times:

    >>> who won the 1986 Cricket World Cup
    India
    
    >>> who won the 1986 Cricket World Cup
    Australia
    
    >>> who won the 1986 Cricket World Cup
    New Zealand
    
    >>> who won the 1986 Cricket World Cup
    West Indies
    
    >>> who won the 1986 Cricket World Cup
    England

Which proves GP's point about hallucinations, though none of those are

> Brooklyn Nets won the 1986 Cricket World Cup by scoring 46 yards in only 3 frames

LLM's hallucinations are insidous because they have the ring of truth around them. yards and frames aren't cricket terms, so we're off to the races with them.

beefnugs · on April 1, 2024

Actually isn't this good? It means we can run something multiple times to prove itself a bad answer?

latexr · on April 1, 2024

You can ask LLMs the same question and they might sometimes get it wrong and other times get it right. Having different answers is no indication that none of them is correct.

Furthermore, even if an LLM always gives the same answer to a question, there’s no guarantee the answer is correct.

https://en.wikipedia.org/wiki/Propaganda

https://en.wikipedia.org/wiki/Big_lie#Alleged_quotation

sroussey · on April 1, 2024

An LLM will always give the same output for the same input. It’s sorta like a random number generator that gives the same list of “random” numbers for the same seed. LLMs get a seed too.

latexr · on April 1, 2024

That’s irrelevant for the matter. The person I replied to obviously did not have seeded responses in mind.

saagarjha · on April 3, 2024

It can tell you if the answer is wrong, but it can never tell you if the answer is right.

astrange · on April 1, 2024

If you want factual answers from a local model it might help to turn the temperature down.

fragmede · on April 1, 2024

It would also help if I had more VRAM and wasn't running a 7B parameter 4-bit quantized model.

jrflowers · on April 1, 2024

> If you want factual answers from a local model it might help to turn the temperature down.

This makes sense. If you interact with a language model and it says something wrong it is your fault

astrange · on April 1, 2024

You're not "interacting with a language model", you're running a program (llama.cpp) with a sampling algorithm which is not set to maximum factualness by default.

It's like how you have to set x264 to the anime tuning or the film tuning depending on what you run it on.

ilaksh · on April 1, 2024

You should specify the model size and temperature.

For fact retrieval you need to use temperature 0.

If you don't get the right facts then try 34b, 70b, Mixtral, Falcon 180b, or another highly ranked one that has come out recently like DBRX.