In a brief test, I found that the bigger context window only meant that I could stuff a whole schema into the input. It still hallucinated a value. When I plugged in a call to a vector embedding to only use the top k most "relevant" fields it did exactly what I wanted: https://twitter.com/_cartermp/status/1657037648400117760
The fundamental problem seems to be that it's still slightly sub-GPT-3.5-quality, and even a long context window can't fix that. It will remember things from many many tokens ago, but it still doesn't reliably produce passable work.
The combination of a GPT-4-quality model and a long context window will unlock a lot of applications that now rely on somewhat lossy window-prying hacks (i.e. summarizing chunks). But any model quality below that won't move the needle much in terms of what useful work is possible, with the exception of fairly simple summarization and text analysis tasks.
Maybe! I certainly look forward to that. Although in my testing GPT-4 also hallucinates a bit (less than gpt-3.5), and the latency is so poor that it's unworkable for our product.
> The fundamental problem seems to be that it's still slightly sub-GPT-3.5-quality
It really depends on what you use it for.
I've found Claude better than GPT4 and even Claude+ at creative writing.
It also tends to give more comprehensive explanations without additional prompting. So I prefer to have it, rather than GPT3.5 or 4, explain things to me.
It's also free, which is another big win over GPT4.
I've found for my use case that both claude-instant-* and claude-* are roughly on par with each other and gpt-3.5. claude-* seems to be the least inaccurate, but we also haven't put it into production like gpt-3.5, so it's hard to say for sure.
In either case, the claude models are very good. I think they'd do fine in a real product. But there's definitely issues that they all have (or that my prompt engineering has).
I am very impressed with the quality of GPT-4, even with the 8k model. However, I have started reaching the limit of what the 8k model can do. I am eagerly awaiting the release of the 32k model.
Claude 100k model is nowhere near in terms of quality in my experience.
YMMV.