> Hi all, I am trying to build a simple LLM bot and want to add guard rails so t...

> Hi all, I am trying to build a simple LLM bot and want to add guard rails so that the LLM responses are constrained.

Give examples of how the LLM should respond. Always give it a default response as well (e.g. "If the user response does not fall into any of these categories, say x").

> I can manually add validation on the response but then it breaks streaming and hence is visibly slower in response.

I've had this exact issue (streaming + JSON). Here's how I approached it: 1. Instruct the LLM to return the key "test" in its response. 2. Make the streaming call. 3. Build your JSON response as a string as you get chunks from the stream. 4. Once you detect "key" in that string, start sending all subsequent chunks wherever you need. 5. Once you get the end quotation, end the stream.