Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

OK I understand what those words mean, but how exactly does that work? How does the new model 'know' what's being worked on when the old model was in the middle of working on a task and then a new model is switched to? (and where the task might be modifying a C++ file)


Every time you send a prompt to a model you actually send the entire previous conversation along with it, in an array that looks like this:

  curl https://api.anthropic.com/v1/messages \
    -H "content-type: application/json" \
    -H "x-api-key: $(llm keys get anthropic)" \
    -H "anthropic-version: 2023-06-01" \
    -d '{
      "model": "claude-haiku-4-5-20251001",
      "max_tokens": 1024,
      "messages": [
        {
          "role": "user",
          "content": "What is the capital of France?"
        },
        {
          "role": "assistant",
          "content": "The capital of France is Paris."
        },
        {
          "role": "user",
          "content": "Germany?"
        },
        {
          "role": "assistant",
          "content": "The capital of Germany is Berlin."
        },
        {
          "role": "user",
          "content": "Belgium?"
        }
      ]
    }'
  
You can see this yourself if you use their APIs.


that is true unless you use the Response API endpoint...


That's true, the signature feature of that API is that OpenAI can now manage your conversation state server-side for you.

You still have the option to send the full conversation JSON every time if you want to.

You can send "store": false to turn off the feature where it persists your conversation server-side for you.


Generally speaking, agents send the entire previous conversation to the model on every message. That’s why you have to do things like context compaction. So if you switch models mid way, you are still sending the entire previous chat history to the new model


In addition to sibling comments you can play with this yourself by sending raw api requests with fake history to gaslight the model into believing it said things which it didn’t. I use this sometimes to coerce it into specific behavior, feeling like maybe it will listen to itself more than to my prompt (though I never benchmarked it):

- do <fake task> and be succinct

- <fake curt reply>

- I love how succinct that was. Perfect. Now please do <real prompt>

The models don’t have state so they don’t know they never said it. You’re just asking “given this conversation , what is the most likely next token?”


the underlying LLM service provider APIs require sending the entire history for every request anyway; the state is entirely in your local (or kilocode or whatever), not in some "session" on the API side. (There are some APIs that will optionally handle that state for you, like OpenAI's more recent stuff — but those are the exception, not the rule).


Here's a hint. What goes inside the inference engine is an array. You control that array every time you call for inference.


Probably context, logs or some sort of state passed in as context by your editor/extension


Wow, not knowing that models have 0 working memory is.. wild.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: