It's great. I'd guess 80-90% of my code is produced in Copilot CLI sessions since the beginning of the year. Copilot CLI is worse than Claude Code, but not by a huge amount. This is mostly working in established 100k+ LOC codebases in C# and TypeScript, with a couple greenfield new projects. I have to write more code by hand in the greenfield projects at their formative stage; LLMs do better following conventions in an existing codebase than being consistent in a new one.
Important things I've figured out along the way:
1. Enable the agent to debug and iterate. Whatever you'd do to test and verify after you write your first pass at an implementation, figure out a way for an agent to do it too. For example: every API call is instrumented with OpenTelemetry, and the agent has a local collector to query.
2. Make scripts or skills to increase the reliability of fallible multi-step processes that need to be repeated often. For example: getting an oauth token to call some api with the appropriate user scopes for the task.
3. Continually revise your AGENTS.md. I'll often end a coding session by asking the agent whether there's anything from this session that should be captured there. That adds more than it removes, so every few days I'll compact it by having an agent reword the important stuff for conciseness and get rid anything obvious from implementation.
I'm sure Alex Karp and Palantir are already charging into the breach, promising to deliver things they don't have the capability to deliver! (Otherwise known as just another day for them)
I would think it pretty clear that they aren't shy about making their displeasure loudly and concretely known when denied. I can imagine an executive order in the research or draft state that'll make it so any entity that continues to deal with Anthropic is automatically put at a disadvantage. Something similar in spirit and effect to the sanctions on Cuba.
Testing the "whole system" for a mature enterprise product is quite difficult. The combinatorial explosion of account configurations and feature usage becomes intractable on two levels: engineers can't anticipate every scenario they need their tests to cover (because the product is too big understand the whole of), and even if comprehensive testing was possible - it would be impractical on some combination of time, flakiness, and cost.
I like this thought. Scaling review is definitely a bottleneck (for those of us who are still reading the code), and spending some tokens to make it easier seems worthwhile.
Yesterday I had it using an internal library without documentation or source code. LSP integration wasn't working. It didn't have decompilation tools or the ability to download them.
I came back to my terminal to find it had written its own tool to decompile the assembly, and successfully completed the task using that info.
I was prepared to disagree with the thesis that estimation is impossible. I've had a decent record at predicting a project timeline that actually tracked with the actual development. I agree with the idea that most of the work is unknown, but it's bounded uncertainty: you can still assert "this blank space on the map is big enough to hold a wyvern, but not an adult dragon" and plan accordingly.
But the author's assessment of the role that estimates play in an organization also rings true. I've seen teams compare their estimates against their capacity, report that they can't do all this work; priorities and expected timelines don't change. Teams find a way to deliver through some combination of cutting scope or cutting corners.
The results are consistent with the author's estimation process - what's delivered is sized to fit the deadline. A better thesis might have been "estimates are useless"?
Important things I've figured out along the way:
1. Enable the agent to debug and iterate. Whatever you'd do to test and verify after you write your first pass at an implementation, figure out a way for an agent to do it too. For example: every API call is instrumented with OpenTelemetry, and the agent has a local collector to query.
2. Make scripts or skills to increase the reliability of fallible multi-step processes that need to be repeated often. For example: getting an oauth token to call some api with the appropriate user scopes for the task.
3. Continually revise your AGENTS.md. I'll often end a coding session by asking the agent whether there's anything from this session that should be captured there. That adds more than it removes, so every few days I'll compact it by having an agent reword the important stuff for conciseness and get rid anything obvious from implementation.
reply