Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
GPT-4 generates simple app from Whiteboard photo (twitter.com/mckaywrigley)
98 points by highwaylights on Sept 27, 2023 | hide | past | favorite | 74 comments


My big hope for all this AI stuff is that normal people start to use computers to solve problems more often. I work as a sys admin, if I want say a specialized script to organize my music in a special way. I can write that, probably wouldn't be the best code cause it doesn't need to be.

The average person could be using computing to solve all kinds of problems but it requires expertise and a time investment that is just not a good value proposition for a lot of people. If GPT or some other thing comes around and makes it so my uncle could say to his computer "hey build me a site so I can show off all my fishing trophies" and it just does it? the world would have a lot more niche software, more weird stuff and overall everyone would be more productive and happy.(or so I hope)


Once the context windows for these models grows by one or two orders of magnitude, stuff is going to be very interesting.


SAP and IBM will be able to boondoggle government projects at scales never seen before!


We had seen interesting developpments around vector databases, but then people stopped hyping them as you could just save them in normal databases without real differences. I wonder what will happen when the models can freely access them though.


Curious, do it mean saving into Postgres with pgvector, or some other technique?


I really don't understand how people figure out the vectors to actually store in the databases, regardless of the underlying storage model.

Isn't that itself the province of an LLM? Say I have a bunch of text. How do I save the text search "by similarity"? Sphinx and semantic search was hard, I remember. Facebook had Faiss. And here we are supposed to just save vectors on commodity hardware BEFORE using an LLM?


> Isn't that itself the province of an LLM?

It is! The steps are:

1. Take a bunch of text, run it through an LLM in embedding mode. The LLM turns the text into a vector. If the text is longer than the LLM context window, chunk it.

2. Store the vector in a vector DB.

3. Use the LLM to generate a vector of your question.

4. Query the vectordb for all similar vectors (that fit in the context window)

5. Get the text from all those vectors. Concatenate the text with the question from step 3.

6. Step 5 is your prompt. The LLM can now answer your question with a collection of similar/relevant text already provided to the LLM in the context window along with your question.


You don't even need an LLM. You can use Word2Vec, or even yank the embeddings matrix from the bottom layer of an LLM. And you can use CLIP and BLIP for images and audio, respectively.


I had scaling issues with pgvector. So would also like to know what regular db can scale?


On a side note, the largest context for GPT4 is 32K. That will cost around $3 (rough estimate). If GPT5 is 20 times the cost of its predecessor as well, then that will cost around $60 per generation. GPT6 will then be $1200.


But Nvidia making future GPUs on ever smaller TSMC nodes will be a counter-force, converging those future model-use costs to similar zones to today's. Ultimately the market will bear what it will, and if the costs are outside that zone, those products won't launch.


The new register size!


> stuff is going to be very interesting

So no change from today?



It takes only a one token error to have code that doesn't compile, execute, throws exceptions, or to be wrong in some serious way. These use cases are amazing nonetheless.

Can the same be said for facts? Hard to generalize. Like lets say I asked GPTV the year a painted was painted. Way back when, I was right, I will never need to know the exact year of a painting - whatever that means - after I finished my modern art history exam. Does GPTV? No. I am happy with ranges in my answers about general knowledge, because at the end of the day, there isn't some error-intolerant process using those outputs anyway. Even order of magnitude errors may be fine, getting the zero wrong in the token sometimes just doesn't matter. It always matters when programming though.

Writers increasingly have the same opinion. As more people use GPT4 and learn how stilted and uncreative it is, and that a one token error can make a plot stop making sense, it isn't clear how it's going to deliver on bulk writing. Spam and content farming sure, but even short prose is not in the realm of reality for actual, real breathing human consumption.

How do you categorize this? I don't know, I'm not in the field. There's some huge difference between one token error means 0% correct, versus one token error means 99% correct. Maybe transformers can solve this. Maybe they never will.


Just ask about the error and it suggests ways to fix it. Alternatively in this development paradigm you only need one senior software dev to work through non compiling and non functioning code to make this whole thing work. Either ways we’re screwed.


You can just ask it about the error, and it will suggest fixes. I don't know why it has to be 100% the first time. Humans almost never are.


You can also ask it about "errors" that aren't errors and it will suggest "fixes".


You can also do that with humans


Humans tend to put up a bit of a fight if you accuse them of producing incorrect program code; you know you're in good company if they pull-out Z3, slam out a few lines in their terminal keyboard, and show you rigid mathematical proof that their code is correct. LLMs don't do that.

...yet.


You can skip this above comment. GPT4 will absolutely double down when it knows it's right. The parent is clueless.


LLMs can do that with chain of thought in a way that generalized to multiple tasks


Only in a vague way. Even with train-of-thought, feedback loops, and other neat tricks, I've never seen an LLM produce valid theorems for Z3 (beyond trivial examples).


You got to pay humans and they have opinions and health issues and stuff.


Not really. It's actually a fairly good interview practice to see if somebody will defend their solution if probed on it.


I've attempted to use this iterative method with GPT4 to build an application, and things just get clumsier and more error-prone as the program grows in complexity. Eventually I get to the point where asking it to make revisions becomes a dice roll with respect to keeping the code behaving as expected or having it arbitrarily omit random portions of the application logic in the rewrite. It's certainly a great way to brainstorm or to quickly produce snippets of logic but it fails for anything beyond toy apps.


This looks promising when Microsoft release the code for it.

https://arxiv.org/abs/2309.12499


That works for a straightforward compiler error, but not for plot or logic errors in prose, which cannot be detected automatically.


It works for all kinds of errors to varying degrees.

Even "logic" errors can be generated in a go and detected by another instance. Like a "this is what i wanted but you did this" works sometimes too.


Most enterprise software just follows basic logic like the workflow above. So worst case scenario chatGPT can build 90% of the code we need.


Because why are we considering supplanting humans for this labor if it provides no additional value (apart from sacrificing humans at the altar of capitalism)?


Cheaper always wins.

But besides that, the LLM is less likely to demand unscheduled time off - especially as a fraction of the hours it can put in. If I have a family emergency once per year, and I need my eight hour day off, I've just removed 1/200 of my yearly output potential. The LLM would need to be down for over 400 hours per year to get to that type of output reduction. Realistically, that is unlikely to happen.


The curious thing about this, though, is that of course you can start by replacing software engineers with it, but how far away are you going to be from being able to replace a CEO with it? I would say not that far away.


"It takes only a one token error to have code that doesn't compile, execute, throws exceptions, or to be wrong in some serious way."

So does my code though. You can take this code, put it into xcode and then ask GPT about any errors. Reiterate and repeat. I did the same while building some Chrome plugins.


And you have no real idea if that worked. You aren’t taking any responsibility. You are just acting as a spokesmodel for the tool.


It's ok, I asked it to write some unit tests too.

Look, seriously, "it hasn't crashed lately" is the standard by which 99.9% of human-written software is judged.


Of course it worked. It did what you wanted it to do.


The trope is software is to Google a problem, find a SO thread and copy the code from there. How is that any different?


You could have a feedback loop, feeding back compilation errors to the model untill it converges to no errors.

Potentially you could do the same for fact-related questions, by fact checking the result against the (properly indexed) training set and feeding the results back.

I wonder if it would work.


Yeah, I've been exploring this generate->error->fix feedback loop along with some test cases on my app builder, and it's quite good but not perfect. It just goes in loops on some errors still.


> only a one token error to have code that doesn't compile, execute, throws exceptions, or to be wrong in some serious way

This was also true in the early days of cars, planes and—for that matter—computers. Most coding, even today, doesn’t require a full-fledged engineer.


Soon, your compiler will interact with LLMs to fix the mistake automatically. In fact, maybe the LLM becomes the compiler.


This is actually a genius idea. The thing is: it probably already exists, just not packaged nicely yet.


You can always just take the result and polish it up.


Not sure this is that interesting since it was already revealed during the GPT-4 livestream back[0] in March.

It’s my understanding that this feature is still rolling out to everyone, so if there are some use cases beyond “simple” - I am sure we’ll hear about it in a more detailed article / blog post.

Not taking anything away, just pointing out the facts.

[0]: https://www.youtube.com/live/outcGtbnMuQ


You could have written off that demo as using text as a crutch. Just using ocr and pasting the text can get you something similar looking. This as well as these:

https://twitter.com/skirano/status/1706823089487491469

https://twitter.com/GabGarrett/status/1706872805214593173

are much more impressive demonstrations. It shows the model understanding ui and figma elements to a non trivial degree.


Nice! And that’s what I mean also, I’d much prefer a thorough review of what it can do as opposed to individual Twitter links. A lot of people are probably getting excited now but keeping track of them (examples) individually is so hard.

There is a submission on the front page,

https://news.ycombinator.com/item?id=37673409

But it isn’t about code, so I am sure something like that is coming.


This is what you're looking for. https://arxiv.org/abs/2309.17421


ChatGPT is basically a buggy app generator. Like the matter replicators on Star Trek, if they occasionally poisoned you or put dirt in your burger.

ChatGPT will be next year’s biggest excuse for bugs in products.


Humans are also buggy app generators. In both cases testing is key to high quality software, not perfection on the first go.


You can fix bugs.

And it undoubtedly reduces the number of developers needed.

It doesn't replace all human coders yet. But a senior, super skilled engineer can now effectively do the task of a team of 10. This will be true for many teams that I have personally seen.

There will be exceptions, too. Not all teams work like that.

I can confidently say that LLMs will cause a reduction in number of developers required at least in the short term.

Factors that cause an increase in demand might propagate simultaneously in a greater rate and undo this but that is unlikely in my opinion.


> But a senior, super skilled engineer can now effectively do the task of a team of 10.

Everyone loves to say this and yet I’ve never seen a single concrete example aside from toy demos.

Here’s the thing: output is pretty easy to see. If one developer can 10x their output, that means they’re doing in a month what would take them nearly a year without AI.

So, show me one example of an individual or team who has done a year’s worth of work in a month.


Humans are incredible bug generators and they take 1000x longer to do it.


Now we can lose billions in blockchain even faster than before


I mean not really surprising. The future programming paradigm is writing high level descriptions of stuff, letting the generative AI write the code for you, and then finetuning it.


This sidesteps the importance of a coherent data model that someone can be held responsible for. I find it hard to imagine corporations being okay with GPT being the responsible party when the question of “who owns the database” comes up. Then again, maybe I’m just old…


Do you ask "who owns the machine code" when your compiler, written by someone else, decides that your code can be simplified?


If it can help with coding, it can also help with modeling.


Disagree for now, but I've been surprised quite a bit this year.


User: Make me a secure Amazon store clone, improve the UI/UX in a way that is similar, but not blatantly ripped off. Write unit tests, and iterate until it works perfectly. Show me a slideshow of the result tomorrow at 10 am.

Computer: I understand. OpenAI requires you to provide a $10,000 advance on this task, and $10,000 more when it's ready. Do you agree?

User: Compared to the money I will make, this is peanuts. I agree.

Computer: Alright, your account was charged. I am also legally required to notify Amazon Hive of every group of users that try to clone their store each day. Don't worry, their legal subAI only sues 0.3% of the time. Have a good afternoon!


That’s not the future. That’s just now. The future is chatgpt suggesting a solution to a problem people haven’t foreseen and then building and deploying the project and making money for itself/its hoster.


I used to be on the "AGI is around the corner" train, but then I learned about the concept of irreducible computation, and now I can safely say that we are far away from the scenario that you describe, and AGI is most certainly not going to happen.


The future is the AI also writes the high level descriptions of stuff.


Should AI really be writing code that was made for writing by humans?

Seems inefficient. Shouldn't the AI just be generating machine code for the computer to run?


The issue though is that while LLMs are good at translation, they aren't good at correctness, because by definition the stuff they generate is somewhat stochastic (they have several answers to a problem with different rating on each one being correct). For generating machine code, there needs to be one output that is the correct answer.

Its much easier in the short term to generate larger snippets of Python code based on prompts and let the programmer fine tune it


Bleak.


That outlook depends what you think programmers do...

Is it the act of writing code or is it creating things? I often wonder this when the threads discussing raw typing speed come up, or the "extra typing" involved in optional curly braces. Even today, without LLMs, only 20% or less of my work is writing actual code. Requirements/research, designing, optimizing, testing, etc make up the lion's share.

So you take out that 20%, I'll be more productive, but my work doesn't change.


My favorite part of my job is solving the problems around implementing an idea. Ideas are easy; problem solving is not easy, and as such I find the challenge fun.

So yeah, bleak.


Not really. If you code in Python or Node, you are already relying on a bunch of computer algorithms to optimize your code.


It's fine. That's how C compilers looked in 1972 too. You can't get out of having to think about things.


From a jobs perspective. Awesome and amazing otherwise.


This looks like what bpmn and other business process rules notation really set out to be originally, but never achieved.


Hoping to add this to my app (https://picoapps.xyz/) once this launches via API. With Pico, you can iterate on hosted apps via LLM text prompts. Being able to circle a button, and ask the AI to add a gradient background or to copy the style of a screenshotted button should really cool! Excited for what's to come.


Love the example.

Tons of indications how ai will/is able to learn flows and architecture and code etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: