I think some of the advanced features around sampling from the calling LLM could...

brumar · 2025-05-11T07:14:57 1746947697

Sampling is to my eyes a very promising aspect of the protocol. Maybe its implementation is lagging behind because it's too far from the previous mental model of tool use. I am also fine if the burden is on the client side if it enables a good DX on server side. In practice, there would be much more servers than clients.

brabel · 2025-05-11T07:40:18 1746949218

> This would’ve worked equally well as a plain HTTP-based protocol

With plain HTTP you can quite easily "stream" both the request's and the response's body: that's a HTTP/1 feature called "chunking" (the message body is not just one byte array, it's "chunked" so that each chunk can be received in sequence). I really don't get why people think you need WS (or ffs SSE) for "streaming". I've implemented a chat using just good old HTTP/1.1 with chunking. It's actually a perfect use case, so it suits LLMs quite well.