Same. I've been needing to update an userscript (JS) that takes stuff like "3 for the price of 1", "5 + 1 free", "35% discount!" from a particular site and then converts the price to a % discount and the price per item / 250 grams.
Its an old userscript so it is glitchy and halfway works. I already pre-chewed the work by telling Gemini 3 exactly which new HTML elements it needs to match and which contents it needs to parse. So basically, the scaffolding is already there, the sources are already there, it just needs to put everything in place.
It fails miserably and produces very convincing looking but failing code. Even letting it iterate multiple times does nothing, nor does nudging it in the correct direction. Mind you that Javascript is probably the most trained-on language together with Python, and parsing HTML is one of the most common usecases.
Another hilarious example is MPV, which has very well-documented settings. I used to think that LLMs would mean you can just tell people to ask Gemini how to configure it, but 9 out of 10 times it will hallucinate a bunch of parameters that never existed.
It gives me an extremely weird feeling when other people are cheering that it is solving problems at superhuman speeds or that it coded a way to ingest their custom XML format in record time, with relatively little prompting. It seems almost impossible that LLMs can both be so bad and so good at the same time, so what gives?
1. Coding with LLMs seems to be all about context management. Getting the LLM to deal with the minimum amount of code needed to fix the problem or build the feature, carefully managing token limits and artificially resetting the session when needed so the context handover is managed, all that. Just pointing an LLM at a large code base and expecting good things doesn't work.
2. I've found the same with Gemini; I can rarely get it to actually do useful things. I have tried many times, but it just underperforms compared to the other mainstream LLMs. Other people have different experiences, though, so I suspect I'm holding it wrong.
The problem is by that point it's much less useful in projects. I still like them but when I get to the point of telling it exactly what to do I'm mostly just being lazy. It's useful in that it might give me some ideas I didn't consider but I'm not sure it's saving time.
Of course, for short one-off scripts, it's amazing. It's also really good at preliminary code reviews. Although if you have some awkward bits due to things outside of your power it'll always complain about them and insist they are wrong and that it can be so much easier if you just do it the naive way.
Amazon's Kiro IDE seems to have a really good flow, trying to split large projects into bite sized chunks. I, sadly, couldn't even get it to implement solitaire correctly, but the idea sounds good. Agents also seem to help a lot since it can just do things from trial and error, but company policy understandably gets complicated quick if you want to provide the entire repo to an LLM agent and run 'user approved' commands it suggests.
From my experience vibe coding, you spend a lot of time preparing documentation and baseline context for the LLM.
On one of my projects, I downloaded a library’s source code locally, and asked Claude to write up a markdown file explaining documenting how to use it with examples, etc.
Like, taking your example for solitaire, I’d ask a LLM to write the rules into a markdown file and tell the coding one to refer to those rules.
I understand it to be a bit like mise en place for cooking.
You tell it what you want and it gives you a list of requirements, which are in that case mostly the rules for Solitaire.
You adjust those until you're happy, then you let it generate tasks, which are essentially epics with smaller tickets in order of dependency.
You approve those and then it starts developing task by task where you can intervene at any time if it starts going off track.
The requirements and tasks, it does really well, but the connection of the epics/larger tasks is where it crumbles mostly. I could have made it work with some more messing around but I've noticed over a couple projects that, at least in my tries, it always crumbles either at the connection of the epics/large tasks or when you ask it to do a small modification later down the line and it causes a lot of smaller, subtle changes all over the place. (could say skill issue since I oversaw something in the requirements, but that's kind of how real projects go, so..)
It also eats tokens like crazy for private usage but that's more so a 'playing around' problem. As it stands I'll probably blow 100$ a day if I connect it to an actual commercial repo and start experimenting. Still viable with my salary, but still..
Its an old userscript so it is glitchy and halfway works. I already pre-chewed the work by telling Gemini 3 exactly which new HTML elements it needs to match and which contents it needs to parse. So basically, the scaffolding is already there, the sources are already there, it just needs to put everything in place.
It fails miserably and produces very convincing looking but failing code. Even letting it iterate multiple times does nothing, nor does nudging it in the correct direction. Mind you that Javascript is probably the most trained-on language together with Python, and parsing HTML is one of the most common usecases.
Another hilarious example is MPV, which has very well-documented settings. I used to think that LLMs would mean you can just tell people to ask Gemini how to configure it, but 9 out of 10 times it will hallucinate a bunch of parameters that never existed.
It gives me an extremely weird feeling when other people are cheering that it is solving problems at superhuman speeds or that it coded a way to ingest their custom XML format in record time, with relatively little prompting. It seems almost impossible that LLMs can both be so bad and so good at the same time, so what gives?