My big hope for all this AI stuff is that normal people start to use computers to solve problems more often. I work as a sys admin, if I want say a specialized script to organize my music in a special way. I can write that, probably wouldn't be the best code cause it doesn't need to be.
The average person could be using computing to solve all kinds of problems but it requires expertise and a time investment that is just not a good value proposition for a lot of people. If GPT or some other thing comes around and makes it so my uncle could say to his computer "hey build me a site so I can show off all my fishing trophies" and it just does it? the world would have a lot more niche software, more weird stuff and overall everyone would be more productive and happy.(or so I hope)
We had seen interesting developpments around vector databases, but then people stopped hyping them as you could just save them in normal databases without real differences. I wonder what will happen when the models can freely access them though.
I really don't understand how people figure out the vectors to actually store in the databases, regardless of the underlying storage model.
Isn't that itself the province of an LLM? Say I have a bunch of text. How do I save the text search "by similarity"? Sphinx and semantic search was hard, I remember. Facebook had Faiss. And here we are supposed to just save vectors on commodity hardware BEFORE using an LLM?
1. Take a bunch of text, run it through an LLM in embedding mode. The LLM turns the text into a vector. If the text is longer than the LLM context window, chunk it.
2. Store the vector in a vector DB.
3. Use the LLM to generate a vector of your question.
4. Query the vectordb for all similar vectors (that fit in the context window)
5. Get the text from all those vectors. Concatenate the text with the question from step 3.
6. Step 5 is your prompt. The LLM can now answer your question with a collection of similar/relevant text already provided to the LLM in the context window along with your question.
You don't even need an LLM. You can use Word2Vec, or even yank the embeddings matrix from the bottom layer of an LLM. And you can use CLIP and BLIP for images and audio, respectively.
On a side note, the largest context for GPT4 is 32K. That will cost around $3 (rough estimate). If GPT5 is 20 times the cost of its predecessor as well, then that will cost around $60 per generation. GPT6 will then be $1200.
But Nvidia making future GPUs on ever smaller TSMC nodes will be a counter-force, converging those future model-use costs to similar zones to today's. Ultimately the market will bear what it will, and if the costs are outside that zone, those products won't launch.
It takes only a one token error to have code that doesn't compile, execute, throws exceptions, or to be wrong in some serious way. These use cases are amazing nonetheless.
Can the same be said for facts? Hard to generalize. Like lets say I asked GPTV the year a painted was painted. Way back when, I was right, I will never need to know the exact year of a painting - whatever that means - after I finished my modern art history exam. Does GPTV? No. I am happy with ranges in my answers about general knowledge, because at the end of the day, there isn't some error-intolerant process using those outputs anyway. Even order of magnitude errors may be fine, getting the zero wrong in the token sometimes just doesn't matter. It always matters when programming though.
Writers increasingly have the same opinion. As more people use GPT4 and learn how stilted and uncreative it is, and that a one token error can make a plot stop making sense, it isn't clear how it's going to deliver on bulk writing. Spam and content farming sure, but even short prose is not in the realm of reality for actual, real breathing human consumption.
How do you categorize this? I don't know, I'm not in the field. There's some huge difference between one token error means 0% correct, versus one token error means 99% correct. Maybe transformers can solve this. Maybe they never will.
Just ask about the error and it suggests ways to fix it. Alternatively in this development paradigm you only need one senior software dev to work through non compiling and non functioning code to make this whole thing work. Either ways we’re screwed.
Humans tend to put up a bit of a fight if you accuse them of producing incorrect program code; you know you're in good company if they pull-out Z3, slam out a few lines in their terminal keyboard, and show you rigid mathematical proof that their code is correct. LLMs don't do that.
Only in a vague way. Even with train-of-thought, feedback loops, and other neat tricks, I've never seen an LLM produce valid theorems for Z3 (beyond trivial examples).
I've attempted to use this iterative method with GPT4 to build an application, and things just get clumsier and more error-prone as the program grows in complexity. Eventually I get to the point where asking it to make revisions becomes a dice roll with respect to keeping the code behaving as expected or having it arbitrarily omit random portions of the application logic in the rewrite. It's certainly a great way to brainstorm or to quickly produce snippets of logic but it fails for anything beyond toy apps.
Because why are we considering supplanting humans for this labor if it provides no additional value (apart from sacrificing humans at the altar of capitalism)?
But besides that, the LLM is less likely to demand unscheduled time off - especially as a fraction of the hours it can put in. If I have a family emergency once per year, and I need my eight hour day off, I've just removed 1/200 of my yearly output potential. The LLM would need to be down for over 400 hours per year to get to that type of output reduction. Realistically, that is unlikely to happen.
The curious thing about this, though, is that of course you can start by replacing software engineers with it, but how far away are you going to be from being able to replace a CEO with it? I would say not that far away.
"It takes only a one token error to have code that doesn't compile, execute, throws exceptions, or to be wrong in some serious way."
So does my code though. You can take this code, put it into xcode and then ask GPT about any errors. Reiterate and repeat. I did the same while building some Chrome plugins.
You could have a feedback loop, feeding back compilation errors to the model untill it converges to no errors.
Potentially you could do the same for fact-related questions, by fact checking the result against the (properly indexed) training set and feeding the results back.
Yeah, I've been exploring this generate->error->fix feedback loop along with some test cases on my app builder, and it's quite good but not perfect. It just goes in loops on some errors still.
Not sure this is that interesting since it was already revealed during the GPT-4 livestream back[0] in March.
It’s my understanding that this feature is still rolling out to everyone, so if there are some use cases beyond “simple” - I am sure we’ll hear about it in a more detailed article / blog post.
Not taking anything away, just pointing out the facts.
You could have written off that demo as using text as a crutch. Just using ocr and pasting the text can get you something similar looking. This as well as these:
Nice! And that’s what I mean also, I’d much prefer a thorough review of what it can do as opposed to individual Twitter links. A lot of people are probably getting excited now but keeping track of them (examples) individually is so hard.
And it undoubtedly reduces the number of developers needed.
It doesn't replace all human coders yet. But a senior, super skilled engineer can now effectively do the task of a team of 10. This will be true for many teams that I have personally seen.
There will be exceptions, too. Not all teams work like that.
I can confidently say that LLMs will cause a reduction in number of developers required at least in the short term.
Factors that cause an increase in demand might propagate simultaneously in a greater rate and undo this but that is unlikely in my opinion.
> But a senior, super skilled engineer can now effectively do the task of a team of 10.
Everyone loves to say this and yet I’ve never seen a single concrete example aside from toy demos.
Here’s the thing: output is pretty easy to see. If one developer can 10x their output, that means they’re doing in a month what would take them nearly a year without AI.
So, show me one example of an individual or team who has done a year’s worth of work in a month.
I mean not really surprising. The future programming paradigm is writing high level descriptions of stuff, letting the generative AI write the code for you, and then finetuning it.
This sidesteps the importance of a coherent data model that someone can be held responsible for. I find it hard to imagine corporations being okay with GPT being the responsible party when the question of “who owns the database” comes up. Then again, maybe I’m just old…
User: Make me a secure Amazon store clone, improve the UI/UX in a way that is similar, but not blatantly ripped off. Write unit tests, and iterate until it works perfectly. Show me a slideshow of the result tomorrow at 10 am.
Computer: I understand. OpenAI requires you to provide a $10,000 advance on this task, and $10,000 more when it's ready. Do you agree?
User: Compared to the money I will make, this is peanuts. I agree.
Computer: Alright, your account was charged. I am also legally required to notify Amazon Hive of every group of users that try to clone their store each day. Don't worry, their legal subAI only sues 0.3% of the time. Have a good afternoon!
That’s not the future. That’s just now. The future is chatgpt suggesting a solution to a problem people haven’t foreseen and then building and deploying the project and making money for itself/its hoster.
I used to be on the "AGI is around the corner" train, but then I learned about the concept of irreducible computation, and now I can safely say that we are far away from the scenario that you describe, and AGI is most certainly not going to happen.
The issue though is that while LLMs are good at translation, they aren't good at correctness, because by definition the stuff they generate is somewhat stochastic (they have several answers to a problem with different rating on each one being correct). For generating machine code, there needs to be one output that is the correct answer.
Its much easier in the short term to generate larger snippets of Python code based on prompts and let the programmer fine tune it
That outlook depends what you think programmers do...
Is it the act of writing code or is it creating things? I often wonder this when the threads discussing raw typing speed come up, or the "extra typing" involved in optional curly braces. Even today, without LLMs, only 20% or less of my work is writing actual code. Requirements/research, designing, optimizing, testing, etc make up the lion's share.
So you take out that 20%, I'll be more productive, but my work doesn't change.
My favorite part of my job is solving the problems around implementing an idea. Ideas are easy; problem solving is not easy, and as such I find the challenge fun.
Hoping to add this to my app (https://picoapps.xyz/) once this launches via API. With Pico, you can iterate on hosted apps via LLM text prompts. Being able to circle a button, and ask the AI to add a gradient background or to copy the style of a screenshotted button should really cool! Excited for what's to come.
The average person could be using computing to solve all kinds of problems but it requires expertise and a time investment that is just not a good value proposition for a lot of people. If GPT or some other thing comes around and makes it so my uncle could say to his computer "hey build me a site so I can show off all my fishing trophies" and it just does it? the world would have a lot more niche software, more weird stuff and overall everyone would be more productive and happy.(or so I hope)