More

zmj · 2026-03-15T20:51:44 1773607904

It's great. I'd guess 80-90% of my code is produced in Copilot CLI sessions since the beginning of the year. Copilot CLI is worse than Claude Code, but not by a huge amount. This is mostly working in established 100k+ LOC codebases in C# and TypeScript, with a couple greenfield new projects. I have to write more code by hand in the greenfield projects at their formative stage; LLMs do better following conventions in an existing codebase than being consistent in a new one.

Important things I've figured out along the way:

1. Enable the agent to debug and iterate. Whatever you'd do to test and verify after you write your first pass at an implementation, figure out a way for an agent to do it too. For example: every API call is instrumented with OpenTelemetry, and the agent has a local collector to query.

2. Make scripts or skills to increase the reliability of fallible multi-step processes that need to be repeated often. For example: getting an oauth token to call some api with the appropriate user scopes for the task.

3. Continually revise your AGENTS.md. I'll often end a coding session by asking the agent whether there's anything from this session that should be captured there. That adds more than it removes, so every few days I'll compact it by having an agent reword the important stuff for conciseness and get rid anything obvious from implementation.

zmj · 2026-02-27T22:06:13 1772229973

This is the happy ending.

ortusdux · 2026-02-27T22:08:50 1772230130

I'm worried about who will rush in to fill the vacuum.

stdgy · 2026-02-27T22:18:26 1772230706

I'm sure Alex Karp and Palantir are already charging into the breach, promising to deliver things they don't have the capability to deliver! (Otherwise known as just another day for them)

nickthegreek · 2026-02-28T15:24:48 1772292288

It was Altman. Papers have been signed.

skeledrew · 2026-02-27T23:14:36 1772234076

No end here. The administration can still go above and beyond in an attempt to completely wreck Anthropic.

nomel · 2026-02-28T02:18:31 1772245111

Could you expand on this? What mechanism? Any examples?

swat535 · 2026-02-28T02:26:01 1772245561

The state has unlimited resources to wreck you and I am not sure many people have faith in the US's judicial system to keep them in check.

This will be a death by thousand cuts.

skeledrew · 2026-02-28T04:14:09 1772252049

I would think it pretty clear that they aren't shy about making their displeasure loudly and concretely known when denied. I can imagine an executive order in the research or draft state that'll make it so any entity that continues to deal with Anthropic is automatically put at a disadvantage. Something similar in spirit and effect to the sanctions on Cuba.

simlevesque · 2026-02-27T22:22:25 1772230945

No, Trump added "I will use the full power of the presidency to make them comply".

zmj · 2026-02-21T23:25:06 1771716306

I also like the callback - not sure if it's intentional - to Stross's "Lobsters" (short story that turned into the novel Accelerando).

zmj · 2026-02-21T22:23:46 1771712626

Testing the "whole system" for a mature enterprise product is quite difficult. The combinatorial explosion of account configurations and feature usage becomes intractable on two levels: engineers can't anticipate every scenario they need their tests to cover (because the product is too big understand the whole of), and even if comprehensive testing was possible - it would be impractical on some combination of time, flakiness, and cost.

zmj · 2026-02-20T16:41:44 1771605704

Separate! You lose the flexibility to move logic between the application and the database when the database is its own API.

zmj · 2026-02-15T21:53:33 1771192413

Try this:

* have Claude produce wireframes of the screens you want. Iterate on those and save them as images.

* then develop. Make sure Claude has the ability to run the app, interact with controls, and take screenshots.

* loop autonomously until the app looks like the wireframes.

Feedback loops are required. Only very simple problems get one-shot.

dwaltrip · 2026-02-15T22:28:35 1771194515

What tools do you use for wireframes / how are you generating them?

sysguest · 2026-02-15T22:03:45 1771193025

hmm but wouldn't that rapidly spend my tokens?

d4mi3n · 2026-02-15T22:12:11 1771193531

Effective use of LLMs in this way is not cheap.

zmj · 2026-02-08T14:55:27 1770562527

I like this thought. Scaling review is definitely a bottleneck (for those of us who are still reading the code), and spending some tokens to make it easier seems worthwhile.

zmj · 2026-01-31T21:39:45 1769895585

Yesterday I had it using an internal library without documentation or source code. LSP integration wasn't working. It didn't have decompilation tools or the ability to download them.

I came back to my terminal to find it had written its own tool to decompile the assembly, and successfully completed the task using that info.

econ · 2026-01-31T22:12:34 1769897554

That's hilarious

zmj · 2026-01-30T02:09:28 1769738968

Paying money to abstract over lower level concerns is civilization.

zmj · 2026-01-24T22:53:42 1769295222

I was prepared to disagree with the thesis that estimation is impossible. I've had a decent record at predicting a project timeline that actually tracked with the actual development. I agree with the idea that most of the work is unknown, but it's bounded uncertainty: you can still assert "this blank space on the map is big enough to hold a wyvern, but not an adult dragon" and plan accordingly.

But the author's assessment of the role that estimates play in an organization also rings true. I've seen teams compare their estimates against their capacity, report that they can't do all this work; priorities and expected timelines don't change. Teams find a way to deliver through some combination of cutting scope or cutting corners.

The results are consistent with the author's estimation process - what's delivered is sized to fit the deadline. A better thesis might have been "estimates are useless"?