Yes it is definitely worse. I submitted feedback a few days ago saying exactly what is being said here, that the model responses look like 3.5.
There are also very telling patterns of response that indicate a pre gpt-4 model.
1: All previous models suffered terribly if your chat got too long. After 20 or so responses it would suddenly start to feel less attentive to what is being said and output superficial or incorrect responses.
2: If you stop a chat midway and come back later to continue (after a refresh or a different chat interaction), it would often respond with code or suggestions that have nothing to do whatsoever with your prompt.
Both these patterns are sometimes evident in the current model. Likely then, there is some clamping down on its capabilities.
My suspicion is, this probably relates to computing resources. The 25 messages cap must mean that itβs difficult to scale its performance. And the only way to do so is to simplify the model activations with heuristics. Perhaps analyzing and preprocessing the input to see how much of the model needs to be used (partial model use can be architected).
This seems to be the simplest explanation of observed state and behaviour.
Can confirm both point 1 and 2. I sometimes burn my quota limit multiple times a day. I have API access to GPT-4 too, and question+answer in my case amounts to $0.3. Aligning the price of GPT-4 in the API with the monthly fees of ChatGPTPlus, that means 66 requests per month. I can burn that in day.
There are also very telling patterns of response that indicate a pre gpt-4 model.
1: All previous models suffered terribly if your chat got too long. After 20 or so responses it would suddenly start to feel less attentive to what is being said and output superficial or incorrect responses.
2: If you stop a chat midway and come back later to continue (after a refresh or a different chat interaction), it would often respond with code or suggestions that have nothing to do whatsoever with your prompt.
Both these patterns are sometimes evident in the current model. Likely then, there is some clamping down on its capabilities.
My suspicion is, this probably relates to computing resources. The 25 messages cap must mean that itβs difficult to scale its performance. And the only way to do so is to simplify the model activations with heuristics. Perhaps analyzing and preprocessing the input to see how much of the model needs to be used (partial model use can be architected).
This seems to be the simplest explanation of observed state and behaviour.