I've used GPT-4 for programming since it came out and it's massively improved my...

biddit · on March 27, 2024

> The larger context window (200k tokens vs ~16k)

Just to add some clarification - the newer GPT4 models from OpenAI have 128k context windows[1]. I regularly load in the entirety of my React/Django project, via Aider.

1. https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turb...

danielbln · on March 27, 2024

GPT-4 has much worse recall compared to Claude-3 though, compare these two haystack tests:

https://github.com/gkamradt/LLMTest_NeedleInAHaystack/raw/ma...

https://cdn.arstechnica.net/wp-content/uploads/2024/03/claud...

aubanel · on March 27, 2024

Be aware the haystack test is not good at all (in its current form). It's a single piece of information inserted in the same text each time, a very poor measurement of how well the LLM can retrieve info.

bufferoverflow · on March 28, 2024

Seems like a very good test for recall.

aubanel · on March 28, 2024

Even in the most restrictive définition of recall as in "retrieve a short contiguous piece of information inside an unrelated context", it's not that good. It's always the exact same needle inserted in the exact same context. Not the slightest variation apart from the location of the needle.

Then if you want to test for recall of sparse information or multi-hop information, it's useless.

BA2255 · on April 11, 2024

For my education, how do you use the 200k contenxt the normal chats like Poe, or chatgpt don't accept longer than 4k maximum. Do you use them in specific Playgrounds or other places?

aubanel · on April 15, 2024

The calls with long context are done through specific APIs, that you can call for instance in Python or Javascript.

Here's a quick start guide with OpenAI: https://platform.openai.com/docs/quickstart?context=python

acedTrex · on March 27, 2024

I see so many people say that llms have improved their programming and i am truly baffled by how. They are useful on occasion and i do use chatgpt and copilot daily but if they disappeared today it would be at worst a minor annoyance.

mattkevan · on March 27, 2024

I’m a designer who likes to code things for fun and would make a terrible programmer.

For me llms are fantastic as they enable me to build things I never could have without.

I imagine they wouldn’t be as useful for a skilled programmer as they can do it all already.

Can’t remember the source, but I read a paper a while back that looked at how much ChatGPT actually helped people in their work. For above average workers it didn’t make much difference, but it made a big improvement for below average workers, bringing them up to the level of their more experienced peers.

dontreact · on March 27, 2024

It’s a tool you learn to use and improve with. How many hours of practice have you given the tool?

Personally I’ve been practicing for almost 3 years now going back to CoPilot so I would guess I have at very minimum a few hundred hours of thinking about how (what to prompt for, what to check the output for) and probably more importantly when (how to decompose the task I am doing into parts for which some the LLM can reliably execute) to use the tool.

stavros · on March 27, 2024

I don't think it's a failure of the prompting, but of the tool itself. It just isn't good for any higher-level tasks, and those I know how to do anyway. It's good for menial API work, as a sibling comment says, and using it I can avoid reading API docs, but that's about it.

I tried to have it rewrite in Go a small Python script (a hundred or so lines) that I wrote, and it basically refused, saying "here's the start, you write the rest now". As others said, if it went away it'd be a minor inconvenience of me having to read a bit more documentation.

dontreact · on March 29, 2024

Yes to me that’s obviously not a good use, and I have experience to know not to attempt hundred line rewrites of files.

I order to reflect on my mental about this. It’s that i more often break down parts problem into sub routines that are 30 or so lines. These are nailed about 95% of the time and when there’s something wrong it’s obvious.

dontreact · on March 29, 2024

I also wonder if it’s a language effect and the models are just way better at python

bun_terminator · on March 27, 2024

Same here. The most use I made of it were

1) making it write some basic code for an API I don't know. Some windows API calls in particular 2) Abusing it as a search engine since google barely qualifies for one anymore

For actual refactoring, the usual tools still blow it out of the water IMO. Same for quality code. It just doesn't compile half the time.

shriek · on March 28, 2024

Yeah, I'm curious how others are using too. For boilerplate code and configs, they're great in a sense that I don't have to open docs for reference but other than that I feel like maybe I'm not fully using it to its full potential like others are mentioning.

Balgair · on March 27, 2024

I mean, I'm not so much of a programmer as a person that can hack together code. Granted, my job is not 'programmer', it's 'R&D'.

For example, I had to patch together some excel files the other day. Data in one file referenced data in ~60 other files in a pretty complicated way, but a standardized one. About 80,000 rows needed to be processed across 60,000 columns. Not terrible, not fun though.

Now, I'm good at excel, but not VBA good. And I'm a 'D-' in python. Passing, but barely. Writing a python program to do the job would take me about a full work day, likely two.

But, when I fired up GPT3.5 (I'm a schlub, I know!) and typed in the request, I got a pretty good working answer in about, ohhh, 15 seconds. Sure, it didn't work the first time, but I was able to hack at it, iterate with GPT3.5 only twice, and I got my work done just fine. Total time? About 15 minutes. Likely a 64X increase in productivity.

So, maybe when we say that our programming is improved, what we actually mean is that our productivity is improved, not our innate abilities to program.

cosmojg · on March 27, 2024

> The low request quota even for paid users is a pain though.

Why not use the API directly? Their Workbench interface is pretty nifty if you don't feel like hooking the API up to your tools of choice.

danielbln · on March 27, 2024

The workbench UI sucks, what's nifty about it? It's cumbersome and slow. I would recommend using a ChatUI (huggingface ChatUI, or https://github.com/lobehub/lobe-chat) and use the API that way.

cosmojg · on March 28, 2024

Right, but it's certainly easier for people who might not even know what "API" stands for, and that's quite nifty. As far as self-hosted frontends go, I can personally recommend SillyTavern[1] in the browser, ChatterUI[2] on mobile, and ShellGPT[3] for CLI. LobeChat looks pretty cool, though! I'll definitely check it out.

[1] https://github.com/SillyTavern/SillyTavern

[2] https://github.com/Vali-98/ChatterUI

[3] https://github.com/TheR1D/shell_gpt

bredren · on March 27, 2024

Which languages are you primarily writing in?

Do you mean you are throttled by it more often? (~“You must wait X mins to make more queries”)

varispeed · on March 27, 2024

It is still remarkably better than working with junior developers. I can ask it to code something based on specification and most of the time it will do better job than if I send the same task to a human.

It often makes some small mistakes, but little nudge here and there corrects them, whereas with human you have to spend a lot of time explaining why something is wrong and why this and that way would rectify it.

The difference is probably because GPT has access to a vast array of knowledge a human can't reasonably compete with.

throwaway35777 · on March 27, 2024

> It is still remarkably better than working with junior developers

It really bugs me when people imply AI will replace only certain devs based on their title.

Seniority does not equal skill. There are plenty of extremely talented "junior" developers who don't have the senior title because of a slow promo process and minimum time in role requirements. They can and do own entire projects and take on senior-level responsibilities.

I've also worked with a "senior" dev who struggled for over a month to make a logging change.

aiisjustanif · on March 27, 2024

> It is still remarkably better than working with junior developers.

Definitely not all junior developers. I have yet to see it do well at handling code migrations, updating APIs, writing end to end tests, and front end code changes with existing UX specifications to name a few things.

greenavocado · on March 27, 2024

Wow, you have good junior developers.

throwaway35777 · on March 27, 2024

Work at a good company and you'll find that the juniors are pretty capable.

bigcoke · on March 27, 2024

that's more or less how I use gpt-4, "how to <describe some algorithm in some language><maybe some useful context with a short code example>", most of time it output something useful that I can work with, but if a junior work is to be my gpt-4, that's a waste of human resource.

danw1979 · on March 27, 2024

The lack of API credits for Claude Pro subscribers is also a bit of an oversight. I’d like to be able to consume my daily quota via the API as well as the chatbot.

lfkdev · on March 27, 2024

Just use the API, do as many requests as you want

ramijames · on March 27, 2024

How do both compare to copilot?