Hacker Newsnew | past | comments | ask | show | jobs | submit | davidpolberger's commentslogin

This is wild. I gave it some legacy XML describing a formula-driven calculator app, and it produced a working web app in under a minute:

https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...

I spent years building a compiler that takes our custom XML format and generates an app for Android or Java Swing. Gemini pulled off the same feat in under a minute, with no explanation of the format. The XML is fairly self-explanatory, but still.

I tried doing the same with Lovable, but the resulting app wouldn't work properly, and I burned through my credits fast while trying to nudge it into a usable state. This was on another level.


This is exactly the kind of task that LLMs are good at.

They are good at transforming one format to another. They are good at boilerplate.

They are bad at deciding requirements by themselves. They are bad at original research, for example developing a new algorithm.


> They are good at transforming one format to another. They are good at boilerplate.

You just described 90% of coding


Thing is, and LLM doesn't need motivation or self-discipline to start writing, which at this point I'm confident is the main slowing down factor in software development, after requirements etc.


These also have larger memory in a way, or deeper stacks of facts. They seems to be able to explore way more sources rapidly and thus emit a solution with more knowledge. As a human I will explore less before trying to solve a problem, and only if that fails I will dig deeper.


But they fail at global context, consistency, and deep understanding which constantly fails them in the real world.

You have to basically tell them all the patterns they need to follow and give them lots of hints to do anything decent, otherwise they invent new helpers that already exist in the codebase, don't follow existing patterns, put code in places that aren't consistent.

They are great at quickly researching a lot, but they start from 0 each time. Then they constantly "cheat" when they can't solve a problem immediately, stuff like casting to "any", skipping tests, deciding "it's ok if this doesn't work" etc.

a few things that would make them much better:

- an ongoing "specific codebase model" that significantly improved ability to remember things across the current codebase / patterns / where/why

- a lot more RL to teach them how to investigate things more deeply and use browsers/debuggers/one-off scripts to actually figure out things before "assuming" some path is right or ok

- much better recall of past conversations dynamically for future work

- much cheaper operating costs, it's clear a big part of why they "cheat" often is because they are told to minimize token costs, it's clear if their internal prompts said "don't be afraid to spin off sub-tasks and dig extremely deep / spend lots of tokens to validate assumptions" they would do a lot better


90% of writing code, sure. But most professionnel programmers write code maybe 20% of the time. A lot of the time is spent clarifying requirements and similar stuff.


The more I hear about other developers' work, the more varied it seems. I've had a few different roles, from one programmer in a huge org to lead programmer in a small team, with a few stints of technical expert in-between. For each the kind of work I do most has varied a lot, but it's never been mostly about "clarifying requirements". As a grunt worker I mostly just wrote and tested code. As a lead I spent most time mentoring, reviewing code, or in meetings. These days I spend most of my time debugging issues and staring at graphics debugger captures.


> As a lead I spent most time

> mentoring

Clarifying either business or technical requirements for newer or junior hires.

> reviewing code

See mentoring.

> or in meetings

So clarifying requirements from/for other teams, including scope, purely financial or technical concerns, etc.

Rephrase "clarifying requirements" to "human oriented aspects of software engineering".

Plus, based on the graphics debugger part of your comment, you're a game developer (or at least adjacent). That's a different world. Most software developers are line of business developers (pharmaceutical, healthcare, automotive, etc) or generalists in big tech companies that have to navigate very complex social environments. In both places, developers that are just heads down in code tend not to do well long term.


> human oriented aspects

The irony is of course that humans in general and software professionals in particular (myself definitely included) notoriously struggle with communication, whereas RLHF is literally optimizing LLMs for clear communication. Why wouldn't you expect an AI that's both a superhuman coder and a superhuman communicator to be decent at translating between human requirements and code?


> Why wouldn't you expect an AI that's both a superhuman coder and a superhuman communicator to be decent at translating between human requirements and code?

At this point LLMs are a superhuman nothing, except in terms of volume, which is a standard computer thing ("To err is human, but to really foul things up you need a computer" - a quote from 60 years ago).

LLMs are fast, reasonably flexible, but at the moment they don't really raise the ceiling in terms of quality, which is what I would define as "superhuman".

They are comparatively cheaper than humans and volume matters ("quantity has a quality all its own" - speaking of quotes). But I'm fairly sure that superhuman to most people means "Superman", not 1 trillion ants :-)


I wrote that based on my experience comparing my prose writing and code to what I can get from ChatGPT or Claude Code, which I feel are on average significantly higher quality than what I can do on a single pass. The quality still improves when I critique its output and iterate with it, but from what I tried, the quality of the result of it doing the work and me critiquing it is better (and definitely faster) than what I get when I try to do it myself and have it critique my approach.

But maybe it's just because I personally am not as good as others, so let me try to offer some examples of tasks where the quality of AI output is empirically better than the human baseline:

1. Chess (and other games) - Stockfish has an ELO of 3644[0], compared to Magnus Carlsen at 2882

2. Natural Language understanding - AIs surpassed the human expert baseline on SuperGlue a while ago [1]

3. General image classification - On Imagenet top-5, facebook's convnext is at 98.55 [2], while humans are at about 94.9% [3]. Humans are still better at poor lighting conditions, but with additional training data, AIs are catching up quickly.

4. Cancer diagnosis - on lymph-node whole slide images, the best human pathologist in the study got an AUC of 0.884, while the best AI classifier was at 0.994 [4]

5. Competition math - AI is at the level of the best competitors, achieving gold level at the IMO this year [5]. It's not clearly superhuman yet, but I expect it will be very soon.

6. Competition coding - Here too AI is head to head with the best competitors, successfully solving all problems at this year's ICPC [6]. Similarly, at the AtCoder World Tour Finals 2025 Heuristic contest, only one human managed to beat the OpenAI submission [7].

So summing this up, I'll say that even if AI isn't better at all of these tasks than the best prepared humans, it's extremely unlikely that I'll get one of those humans to do tasks for me. So while AI is still very flawed, I already quite often prefer to rely on it rather to delegate to another human, and this is as bad as it ever will be.

P.S. While not a benchmark, there's a small study from last year that looked at the quality of AI-generated code documentation in comparison to the actual human-written documentation in a variety of code bases and found "results indicate that all LLMs (except StarChat) consistently outperform the original documentation generated by humans." [8]

[0] https://computerchess.org.uk/ccrl/4040/

[1] https://super.gluebenchmark.com/

[2] https://huggingface.co/spaces/Bekhouche/ImageNet-1k_leaderbo...

[3] https://cs.stanford.edu/people/karpathy/ilsvrc/

[4] https://jamanetwork.com/journals/jama/fullarticle/2665774

[5] https://deepmind.google/blog/advanced-version-of-gemini-with...

[6] https://worldfinals.icpc.global/2025/openai.html

[7] https://arstechnica.com/ai/2025/07/exhausted-man-defeats-ai-...

[8] https://arxiv.org/pdf/2312.10349


Brother, you are not going to convince people who dedicated their lives to learning a language, knowledge that bankrolls a pretty cushy life, that that language is likely to soon be readily accessible to everyone with access to a machine translator.


Indeed, or in the words of Upton Sinclair:

> It is difficult to get a man to understand something, when his salary depends on his not understanding it.


Any chance the business/product folks will be using LLMs on their side to help with "clarifying requirements" before they turn them over to the developers?

They view this task as tedious minutia which is the sort of thing LLMs like to churn out.


They’re bad at 90% of coding, but for other reasons. That said if you babysit them incessantly they can help you move a bit faster through some of it.


Maybe 90% of the actual typing part of coding, but not 90% of the JOB of coding.


+/-

> They are bad at deciding requirements by themselves.

What do you mean by requirements here? In my experience the frontier models today are pretty good at figuring out requirements, even when you don't explicitly state them.

> They are bad at original research

Sure, I don't have any experience with that, so I'll trust you on that.

> for example developing a new algorithm.

This is just not correct. I used to think so, but I was trying to come up with a pretty complicated pattern matching, multi-dimensional algorithm (I can't go into the details) - it was something that I could figure out on my own, and was half way through it, but decided to write up a description of it and feed it to gemini 2.5 pro a couple of months ago, and I was stunned.

It came up with a really clever approach and something I had previously been convinced the models weren't very good at it.

In hindsight, since they are getting so good at math in general, there's probably some overlap, but you should revisit your views on this.

--

Your 'bad at' list is missing a few things though:

- Calculations (they can come up with how to calculate or write a program to calculate from given data, but they are not good at calculating in their responses)

- Even though the frontier models are multi-modal, they are still bad at visualizing html/css - or interpreting what it would look like

- Same goes for visualizing/figuring out visual errors in graphics programming such as games programming or 3d modeling (z-index issues, orientation etc)


> I was trying to come up with a pretty complicated pattern matching, multi-dimensional algorithm (I can't go into the details)

The downside is that if you used Gemini to create the algorithm, your company won't be able to patent it.

Or maybe that's a good thing, for the rest of us.


Figuring out detailed requirements requires a lot of contact with reality. Specific details about not only the technical surface area but also the organizational and financial constraints. An AI model with the appropriate context would probably do well. It seems one of the things humans do much better at the moment is distill the big picture across a long period of time.


I like to use Claude Code to write deterministic computer programs for me, which then perform the actual work. It saves a lot of time.

I had a big backlog of "nice to have scripts" I wanted to write for years, but couldn't find the time and energy for. A couple of months after I started using Claude Code, most of them exist.


That’s great and the only legitimate use case here. I suspect Microsoft will not try to limit customers to just writing scripts and will instead allow and perhaps even encourage them to let the AI go ham on a bunch of raw data with no intermediary code that could be reviewed.

Just a suspicion.


I'm a co-founder of Calcapp, an app builder for formula-driven apps using Excel-like formulas. I spent a couple of days using Claude Code to build 20 new templates for us, and I was blown away. It was able to one-shot most apps, generating competent, intricate apps from having looked at a sample JSON file I put together. I briefly told it about extensions we had made to Excel functions (including lambdas for FILTER, named sort type enums for XMATCH, etc), and it picked those up immediately.

At one point, it generated a verbose formula and mentioned, off-handedly, that it would have been prettier had Calcapp supported LET. "It does!", I replied, "and as an extension, you can use := instead of , to separate names and values!") and it promptly rewrote it using our extended syntax, producing a sleek formula.

These templates were for various verticals, like real estate, financial planning and retail, and I would have been hard-pressed to produce them without Claude's domain knowledge. And I did it in a weekend! Well, "we" did it in a weekend.

So this development doesn't really surprise me. I'm sure that Claude will be right at home in Excel, and I have already thought about how great it would be if Claude Code found a permanent home in our app designer. I'm concerned about the cost, though, so I'm holding off for now. But it does seem unfair that I get to use Claude to write apps with Calcapp, while our customers don't get that privilege.

(I wrote more about integrating Claude Code here: https://news.ycombinator.com/item?id=45662229)


I've been using Claude Code a lot lately, and I've been thinking of integrating it into our SaaS tool (a formula-driven app designer). I've been holding off primarily because I've been afraid of the cost (we're not making much money off our $9/mo. customers as it is, and this definitely wouldn't help that).

However, it's becoming clear to me that individual apps and websites won't have their own integrated chatbots for long. They'll be siloed, meaning that they can't talk to one another -- and they sure can't access my file system. So we'll have a chatbot first as part of the web browser, and ultimately as part of the operating system, able to access all your stuff and knowing everything about you. (Scary!)

So the future is to make your software scriptable -- not necessarily for human-written scripts, but for LLM integration (using MCP?). Maybe OLE from the nineties was prescient?

Short-term, though, integrating an LLM would probably be good for business, but given that I'm our only engineer and the fact that our bespoke chatbot would likely become obsolete within two years, I don't think it would be worth the investment.


If your strategy is to be a data source for an llm sure. But if you inspire to bring your own unique AI despite its flaws then that is another matter thing altogether and I don’t think it is completely worthless endeavour. Remember how OpenAI killed gpt4o and it turned out it was actually beloved by many although newer versions allegedly perform better?


Your read is correct.

2-3 chatbots (prolly OAI, Gemini, Claude) will own the whole context, everywehre


I'm working on an engine for Excel-like formulas, which will be available both as a library and as a service (which I've mentioned on HN a few times before). I originally started work on the engine back in 2008, when our app builder needed it.

This is a wheel I see people reinventing all the time, often for use in SaaS applications. The implementations are often underwhelming: function support is limited, documentation is sparse to non-existent and errors are typically only communicated at runtime -- if at all. Formula editors usually lack autocomplete, making them frustrating to use.

I've spent years solving all these problems (with a statically-typed language), and I'd love for others to benefit from the work. I have extracted the formula engine from our app compiler, so the library is nearly complete. The runtime part (evaluating formulas) has been rewritten in TypeScript. Next, I'll build a service around it to validate, compile and evaluate formulas -- which should be fun.

I'm planning to do a Show HN once I have a preview up and running.


Is this like gorules?


No, not really. GoRules appears to be a decision engine that allows non-technical users to define rules visually through a graphical interface. Engineers can then interpret and evaluate these rules using provided libraries.

What I'm building is a formula engine that validates, compiles, and evaluates Excel-like formulas. Compared to GoRules, it’s more akin to the ZEN expression language component than to the broader GoRules system.


Other apt comparisons to what I'm building include HyperFormula and Microsoft Power Fx.


As a HyperFormula founder, thank you for the mention. I wonder what’s your opinion on our formula library


very cool


Thanks, I'll point their support staff to that document.

I was told that Firefox is no longer supported by a support technician over the phone. A different support technician wrote this to me in an email:

"Regarding the printing option on Firefox, I'm afraid this is no longer within Stripe's support scope."

(I complained that it is not possible to print certain pages of the Dashboard through Firefox, while Chrome works acceptably well.)


I know this doesn't add much value to the discussion, but I was really proud of my UIN when I was a teenager. And this may be my last chance to flaunt it, so here it is:

1779900

So back in the day, these were known as Universal Internet Numbers, or UINs. You have to admire the sheer audacity of using that name for the user identifiers of a service you're building. I believe they were renamed to "ICQ#" later.


Strictly speaking, sequential numbers can scale infinitely, so not the worst way to handle it.


That’s a very impressive UIN!


UINs! I didn't remember how we called ICQ numbers! Thank you!


Same, nice repeating digits. 2288665


bro, nice dubs!


I have been self-employed since 2008, when I quit my job in software engineering to go all-in on my software business (that dated from 2003). That failed spectacularly, because I only focused on technology and not on the value I was creating, and with few customers, I had to do on-site contracting for more than a year before going on full-time parental leave.

I then rebooted my software project, launched a landing site and started talking to prospects (hundreds of them), before I set out to pivot my existing product to something that might gain traction. (I wound up throwing away 95 percent of the code.) I spent 2014 through 2019 with the product in beta, barely making a living off of a few enterprise support contracts and doing freelance photography (and depleting my savings), but spending at least 80 percent of my time on building the product and getting it to a finished state.

(Some people seem to be able to build a product in a weekend that gets eager customers. I'm not one of those people, choosing to build something that was, in retrospect, much too big of a project for one person. I probably also spent too much time polishing the product before commercializing it, likely due to a fear of failure.)

In 2019, the product was finally commercialized as a SaaS service. I remember thinking that I either wanted it to be a spectacular success, or a spectacular failure (so that I could focus on other things, after close to 20 years).

It was neither, but has been growing steadily ever since. I would have made much more money working for someone else, but the freedom is unparalleled. I get to set my own hours and focus on things I consider important. I enjoy doing everything from support calls and UX work to building a compiler and a type system (that I have mentioned before on HN).

I also have no one I need to answer to, other than our customers. That has been important over the past couple of years, when a series of health emergencies in my family has diverted my attention elsewhere. I have been very fortunate to be able to do so, focusing on what's important, without having to ask permission to cut down on work temporarily.

Overall, I wouldn't trade this for anything. This year, my product will gain a sister product in a more lucrative field (I'm hoping), and I have plans to commercialize my compiler, both as a service and as a traditionally-licensed library. So I'm excited to stay solo and keep working on building the business.


That's some very impressive stamina and it must have been really hard at times.


Thanks, it certainly has been. After only a couple of years in the software industry proper, though, I felt I had seen all I needed to see. Crunch time. Arbitrary, ill-informed management decisions. Management who didn't believe in the product the team was passionate about. Products canceled through no fault of anyone working on it. Office politics and bickering.

With my own business, I gain agency. If the product fails, it's because I failed to market it properly, or the product vision was bad and did not resonate with enough customers, or because I failed to execute on that vision. When all the decisions are out of your hands, and you can't even see what prompted them, they can feel capricious and arbitrary.

With my own business, I am in control. I don't have one boss, I have hundreds of them. And as long as I continue to provide them with value, I get to continue doing what I'm doing. I like those terms.


Yes, to all of that. But: the amount of control is directly proportional to how fat your wallet is, as it gets leaner you lose some of that agency so make sure you never even go close to depleting your reserves. That's a lesson I learned the hard way at some point and it causes me to be pretty cautious from a financial perspective. So far so good ;)


Could someone provide insights into the implications of a hypothetical Schrems III for EU-based SaaS companies that host their servers in the US, particularly those containing Personally Identifiable Information (PII) like email addresses? Essentially, would Schrems III mean that we'd need to immediately move our servers to EU soil, or risk fines?


Whether your servers are in the US or not, if you do business in the EU, EU rules apply. It might be that you will legally not be able to offer your services in the EU, if you have servers in the US, because those can the accessed by US authorities at any time, without you even learning about it. It is probably safer to have servers in the EU, if you want to do business in the EU. Servers in the EU not provided by any US hoster, since that hoster is vulnerable to being ordered in the US to transfer data from EU to the US.


Everyone -- thanks for your constructive comments. I think that we'll start off by releasing the libraries under permissive licenses, and see how that goes. Releasing the code of the user-facing products publicly would probably provide little value for others (as the code would not be open source), and there are indeed risks.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: