The issue I have with SSE and what is being proposed in this article (which is very similar), is the very long lived connection.
OpenAI uses SSE for callbacks. That works fine for chat and other "medium" duration interactions but when it comes to fine tuning (which can take a very long time), SSE always breaks and requires client side retries to get it to work.
So, why not instead use something like long polling + http streaming (a slight tweak on SSE). Here is the idea:
1) Make a standard GET call /api/v1/events (using standard auth, etc)
2) If anything is in the buffer / queue return it immediately
3) Stream any new events for up to 60s. Each event has a sequence id (similar to the article). Include keep alive messages at 10s intervals if there are no messages.
4) After 60s close the connection - gracefully ending the interaction on the client
5) Client makes another GET request using the last received sequence
What I like about this is it is very simple to understand (like SSE - it basically is SSE), has low latency, is just a standard GET with standard auth and works regardless of how load balancers, etc., are configured. Of course, there will be errors from time to time, but dealing with timeouts / errors will not be the norm.
My issue with eventsource is it doesn't use standard auth. Including the jwt in a query string is an odd step out requiring alternate middleware and feels like there is a high chance of leaking the token in logs, etc.
I'm curious though, what is your solution to this?
Secondly, not every client is a browser (my OpenAI / fine tune example is non-browser based).
Finally, I just don't like the idea of things failing all time with something working behind the scenes to resolve issues. I'd like errors / warnings in logs to mean something, personally.
>> I don't understand the advantages of recreating SSE yourself like this vs just using SSE
This is more of a strawman and don't plan to implement it. It is based on experiences consuming SSE endpoints as well as creating them.
> I'm curious though, what is your solution to this?
Cookies work fine, and are the usual way auth is handled in browsers.
> Secondly, not every client is a browser (my OpenAI / fine tune example is non-browser based).
That's fair. It still seems easier, to me, to save any browser-based clients some work (and avoid writing your own spec) by using existing technologies. In fact, what you described isn't even incompatible with SSE - all you have to do is have the server close the connection every 60 seconds on an otherwise normal SSE connection, and all of your points are covered except for the auth one (I've never actually seen bearer tokens used in a browser context, to be fair - you'd have to allow cookies like every other web app).
OpenAI uses SSE for callbacks. That works fine for chat and other "medium" duration interactions but when it comes to fine tuning (which can take a very long time), SSE always breaks and requires client side retries to get it to work.
So, why not instead use something like long polling + http streaming (a slight tweak on SSE). Here is the idea:
1) Make a standard GET call /api/v1/events (using standard auth, etc)
2) If anything is in the buffer / queue return it immediately
3) Stream any new events for up to 60s. Each event has a sequence id (similar to the article). Include keep alive messages at 10s intervals if there are no messages.
4) After 60s close the connection - gracefully ending the interaction on the client
5) Client makes another GET request using the last received sequence
What I like about this is it is very simple to understand (like SSE - it basically is SSE), has low latency, is just a standard GET with standard auth and works regardless of how load balancers, etc., are configured. Of course, there will be errors from time to time, but dealing with timeouts / errors will not be the norm.