Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've used GPT-4 for programming since it came out and it's massively improved my productivity, despite being frustrating at times. It quickly forgets details and starts hallucinating, so I have to constantly remind it by pasting in code. After a few hours it gets so confused I have to start a new chat to reset things.

I've been using Claude pretty intensively over the last week and it's so much better than GPT. The larger context window (200k tokens vs ~16k) means that it can hold almost the entire codebase in memory and is much less likely to forget things.

The low request quota even for paid users is a pain though.



> The larger context window (200k tokens vs ~16k)

Just to add some clarification - the newer GPT4 models from OpenAI have 128k context windows[1]. I regularly load in the entirety of my React/Django project, via Aider.

1. https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turb...


GPT-4 has much worse recall compared to Claude-3 though, compare these two haystack tests:

https://github.com/gkamradt/LLMTest_NeedleInAHaystack/raw/ma...

https://cdn.arstechnica.net/wp-content/uploads/2024/03/claud...


Be aware the haystack test is not good at all (in its current form). It's a single piece of information inserted in the same text each time, a very poor measurement of how well the LLM can retrieve info.


Seems like a very good test for recall.


Even in the most restrictive définition of recall as in "retrieve a short contiguous piece of information inside an unrelated context", it's not that good. It's always the exact same needle inserted in the exact same context. Not the slightest variation apart from the location of the needle.

Then if you want to test for recall of sparse information or multi-hop information, it's useless.


For my education, how do you use the 200k contenxt the normal chats like Poe, or chatgpt don't accept longer than 4k maximum. Do you use them in specific Playgrounds or other places?


The calls with long context are done through specific APIs, that you can call for instance in Python or Javascript.

Here's a quick start guide with OpenAI: https://platform.openai.com/docs/quickstart?context=python


I see so many people say that llms have improved their programming and i am truly baffled by how. They are useful on occasion and i do use chatgpt and copilot daily but if they disappeared today it would be at worst a minor annoyance.


I’m a designer who likes to code things for fun and would make a terrible programmer.

For me llms are fantastic as they enable me to build things I never could have without.

I imagine they wouldn’t be as useful for a skilled programmer as they can do it all already.

Can’t remember the source, but I read a paper a while back that looked at how much ChatGPT actually helped people in their work. For above average workers it didn’t make much difference, but it made a big improvement for below average workers, bringing them up to the level of their more experienced peers.


It’s a tool you learn to use and improve with. How many hours of practice have you given the tool?

Personally I’ve been practicing for almost 3 years now going back to CoPilot so I would guess I have at very minimum a few hundred hours of thinking about how (what to prompt for, what to check the output for) and probably more importantly when (how to decompose the task I am doing into parts for which some the LLM can reliably execute) to use the tool.


I don't think it's a failure of the prompting, but of the tool itself. It just isn't good for any higher-level tasks, and those I know how to do anyway. It's good for menial API work, as a sibling comment says, and using it I can avoid reading API docs, but that's about it.

I tried to have it rewrite in Go a small Python script (a hundred or so lines) that I wrote, and it basically refused, saying "here's the start, you write the rest now". As others said, if it went away it'd be a minor inconvenience of me having to read a bit more documentation.


Yes to me that’s obviously not a good use, and I have experience to know not to attempt hundred line rewrites of files.

I order to reflect on my mental about this. It’s that i more often break down parts problem into sub routines that are 30 or so lines. These are nailed about 95% of the time and when there’s something wrong it’s obvious.


I also wonder if it’s a language effect and the models are just way better at python


Same here. The most use I made of it were

1) making it write some basic code for an API I don't know. Some windows API calls in particular 2) Abusing it as a search engine since google barely qualifies for one anymore

For actual refactoring, the usual tools still blow it out of the water IMO. Same for quality code. It just doesn't compile half the time.


Yeah, I'm curious how others are using too. For boilerplate code and configs, they're great in a sense that I don't have to open docs for reference but other than that I feel like maybe I'm not fully using it to its full potential like others are mentioning.


I mean, I'm not so much of a programmer as a person that can hack together code. Granted, my job is not 'programmer', it's 'R&D'.

For example, I had to patch together some excel files the other day. Data in one file referenced data in ~60 other files in a pretty complicated way, but a standardized one. About 80,000 rows needed to be processed across 60,000 columns. Not terrible, not fun though.

Now, I'm good at excel, but not VBA good. And I'm a 'D-' in python. Passing, but barely. Writing a python program to do the job would take me about a full work day, likely two.

But, when I fired up GPT3.5 (I'm a schlub, I know!) and typed in the request, I got a pretty good working answer in about, ohhh, 15 seconds. Sure, it didn't work the first time, but I was able to hack at it, iterate with GPT3.5 only twice, and I got my work done just fine. Total time? About 15 minutes. Likely a 64X increase in productivity.

So, maybe when we say that our programming is improved, what we actually mean is that our productivity is improved, not our innate abilities to program.


> The low request quota even for paid users is a pain though.

Why not use the API directly? Their Workbench interface is pretty nifty if you don't feel like hooking the API up to your tools of choice.


The workbench UI sucks, what's nifty about it? It's cumbersome and slow. I would recommend using a ChatUI (huggingface ChatUI, or https://github.com/lobehub/lobe-chat) and use the API that way.


Right, but it's certainly easier for people who might not even know what "API" stands for, and that's quite nifty. As far as self-hosted frontends go, I can personally recommend SillyTavern[1] in the browser, ChatterUI[2] on mobile, and ShellGPT[3] for CLI. LobeChat looks pretty cool, though! I'll definitely check it out.

[1] https://github.com/SillyTavern/SillyTavern

[2] https://github.com/Vali-98/ChatterUI

[3] https://github.com/TheR1D/shell_gpt


Which languages are you primarily writing in?

Do you mean you are throttled by it more often? (~“You must wait X mins to make more queries”)


It is still remarkably better than working with junior developers. I can ask it to code something based on specification and most of the time it will do better job than if I send the same task to a human.

It often makes some small mistakes, but little nudge here and there corrects them, whereas with human you have to spend a lot of time explaining why something is wrong and why this and that way would rectify it.

The difference is probably because GPT has access to a vast array of knowledge a human can't reasonably compete with.


> It is still remarkably better than working with junior developers

It really bugs me when people imply AI will replace only certain devs based on their title.

Seniority does not equal skill. There are plenty of extremely talented "junior" developers who don't have the senior title because of a slow promo process and minimum time in role requirements. They can and do own entire projects and take on senior-level responsibilities.

I've also worked with a "senior" dev who struggled for over a month to make a logging change.


> It is still remarkably better than working with junior developers.

Definitely not all junior developers. I have yet to see it do well at handling code migrations, updating APIs, writing end to end tests, and front end code changes with existing UX specifications to name a few things.


Wow, you have good junior developers.


Work at a good company and you'll find that the juniors are pretty capable.


that's more or less how I use gpt-4, "how to <describe some algorithm in some language><maybe some useful context with a short code example>", most of time it output something useful that I can work with, but if a junior work is to be my gpt-4, that's a waste of human resource.


The lack of API credits for Claude Pro subscribers is also a bit of an oversight. I’d like to be able to consume my daily quota via the API as well as the chatbot.


Just use the API, do as many requests as you want


How do both compare to copilot?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: