Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Agreed with the points in that article, but IMHO the no 1 issue is that agents only see a fraction of the code repository. They don't know whether there is a helper function they could use, so they re-implement it. When contributing to UIs, they can't check the whole UI to identify common design patterns, so they re-invent it.

The most important task for the human using the agent is to provide the right context. "Look at this file for helper functions", "do it like that implementation", "read this doc to understand how to do it"... you can get very far with agents when you provide them with the right context.

(BTW another issue is that they have problems navigating the directory structure in a large mono repo. When the agents needs to run commands like 'npm test' in a sub-directory, they almost never get it right the first time)



This is what I keep running into. Earlier this week I did a code review of about new lines of code, written using Cursor, to implement a feature from scratch, and I'd say maybe 200 of those lines were really necessary.

But, y'know what? I approved it. Because hunting down the existing functions it should have used in our utility library would have taken me all day. 5 years ago I would have taken the time because a PR like that would have been submitted by a new team member who didn't know the codebase well, and helping to onboard new team members is an important part of the job. But when it's a staff engineer using Cursor to fill our codebase with bloat because that's how management decided we should work, there's no point. The LLM won't learn anything and will just do the same thing over again next week, and the staff engineer already knows better but is being paid to pretend they don't.


>>because that's how management decided we should work, there's no point

If you are personally invested, there would be a point. At least if you plan to maintain that code for a few more years.

Let's say you have a common CSS file, where you define .warning {color: red}. If you want the LLM to put out a warning and you just tell it to make it red, without pointing out that there is the .warning class, it will likely create a new CSS def for that element (or even inline it - the latest Claude Code has a tendency to do that). That's fine and will make management happy for now.

But if later management decides that it wants all warning messages to be pink, it may be quite a challenge to catch every place without missing one.


There really wouldn't be; it would just be spitting into the wind. What am I going to do, convince every member of my team to ignore a direct instruction from the people who sign our paychecks?


I really really hate code review now. My colleagues will have their LLMs generate thousands of lines of boiler plate with every pattern and abstraction under the sun. A lazy programmer use to do the bare minimum and write not enough code. That made review easy. Error handling here, duplicate code there, descriptive naming here, and so on. Now a lazy programmer generates a crap load of code cribbed from "best practice" tutorials, much of it unnecessary and irrelevant for the actual task at hand.


> When the agents needs to run commands like 'npm test' in a sub-directory, they almost never get it right the first time)

I was running into this constantly on one project with a repo split between a Vite/React front end and .NET backend (with well documented structure). It would sometimes go into panic mode after some npm command didn’t work repeatedly and do all sorts of pointless troubleshooting over and over, sometimes veering into destructive attempts to rebuild whatever it thought was missing/broken.

I kept trying to rewrite the section in CLAUDE.md to effectively instruct it to always first check the current directory to verify it was in the correct $CLIENT or $SERVER directory. But it would still sometimes forget randomly which was aggravating.

I ended up creating some aliases like “run-dev server restart” “run-dev client npm install” for common operations on both server/client that worked in any directory. Then added the base dotnet/npm/etc commands to the deny list which forced its thinking to go “Hmm it looks like I’m not allowed to run npm, so I’ll review the project instructions. I see, I can use the ‘run-dev’ helper to do $NPM_COMMAND…”

It’s been working pretty reliably now but definitely wasted a lot of time with a lot of aggravation getting to that solution.


I wonder if a large context model could be employed here via tool call. One of the great things Gemini chat can do is ingest a whole GitHub repo.

Perhaps "before implementing a new utility or helper function, ask the not-invented-here tool if it's been done already in the codebase"

Of course, now I have to check if someone has done this already.


Large context models don't do a great job of consistently attending to the entire context, so it might not work out as well in practice as continuing to improve the context engineering parts of coding agents would.

I'd bet that most the improvement in Copilot style tools over the past year is coming from rapid progress in context engineering techniques, and the contribution of LLMs is more modest. LLMs' native ability to independently "reason" about a large slushpile of tokens just hasn't improved enough over that same time period to account for how much better the LLM coding tools have become. It's hard to see or confirm that, though, because the only direct comparison you can make is changing your LLM selection in the current version of the tool. Plugging GPT5 into the original version of Copilot from 2021 isn't an experiment most of us are able to try.


Sure, but just bcuz it went into context doesn't mean LLM "understand" it. Also, not all sections of context iz equal.


Claude can use use tools to do that, and some different code indexer MCPs work, but that depends on the LLM doing the coding to make the right searches to find the code. If you are in a project where your helper functions or shared libs are scattered everywhere it’s a lot harder.

Just like with humans it definitely works better if you follow good naming conventions and file patterns. And even then I tend to make sure to just include the important files in the context or clue the LLM in during the prompt.

It also depends on what language you use. A LOT. During the day I use LLMs with dotnet and it’s pretty rough compared to when I’m using rails on my side projects. Dotnet requires a lot more prompting and hand holding, both due to its complexity but also due to how much more verbose it is.


This is what we do at Augmentcode.com.

We started with building the best code retrieval and build an agent around it.


That's what claude.md etc are for. If you want it to follow your norms then you have to document them.


Well, sure, but from what I know, humans are way better at following 'implicit' instructions than LLMs. A human programmer can 'infer' most of the important basic rules from looking at the existing code, whereas all this agents.md/claude.md/whatever stuff seems necessary to even get basic performance in this regard.

Also, the agents.md website seems to mostly list README.md-style 'how do I run this instructions' in its example, not stylistic guidelines.

Furthermore, it would be nice if these agents add it themselves. With a human, you tell them "this is wrong, do it that way" and they would remember it. (Although this functionality seems to be worked on?)


That's fine for norms, but I don't think you can use it to describe every single piece of your code. Every function, every type, every CSS class...


To be fair, this is a daily life story for any senior engineer working with other engineers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: