From the first look it looks like Claude 3 performed as well as the rest of the ...

From the first look it looks like Claude 3 performed as well as the rest of the models. Did not see any massive improvement.

I tried a simple prompt driven web devel approach using OpenRouter to test GPT4-Turbo, Claude 3 Opus and Mistral Large, along with GPT3.5 and some other models.

I prompted about 6 small content sections each with different requirements (headline, main text, motto, download links, footer).

Each model was able to provide reasonable HTML and CSS.

However, ALL models started losing context dropping elements and were unable to finish the page with full content. I had to prompt the missing content again.

Surely there would be enough context window for a few lines of text?

Disclaimer: I've been using Copilot for almost 3 years now but mostly for Python.