AI found a bug in my code

drpixie · on Nov 17, 2022

My takeaway from the article - the success of their language model illustrates what a huge fraction of our code is boilerplate.

Yes, it's helpful that the system shows bugs, but it does this, not through careful analysis of the control flow or subtle type analysis, but by "probability of each token appearing given the previous tokens".

If such a large proportion of our code follows common patterns, are we not wasting huge amounts of time writing and testing the same functions across thousands or millions of pieces of code? If we (almost) always follow a certain pattern, should not that pattern be embedded in a library or language, so vastly reducing the opportunity for errors or bugs?

constantcrying · on Nov 17, 2022

It really is the exact opposite approach to static analysis, which tries to see what the code really does andhow that leads to bugs. I have had (quite expensive) static analysis tools detect genuine bugs, e.g. a somewhat subtle overflow. What it can never detect though is correct code that misses the intention of the programmer. E.g. whether some mathematical function is accurate.

The language models try, by statistical means, to derive what should be there. Given enough data they will start to have sone (statistical) grasp on the intention.

I am not entirely sure about the boilerplate though. Often you need some minor variation of an already existing pattern. Trying to unify those slightly divergent patterns into one schema can very easily lead to very hard to understand code. Another thing is thar boilerplate is fairly easy to write and to test, because it is familiar, reducing the actual effort which goes into it. Sometimes it is just better to not reuse code.

ajuc · on Nov 17, 2022

When you go too far reducing boilerplate you get to a point where configuration becomes the code and the actual code becomes black box that people barely understand. And then they replicate what's already in the black box because they aren't sure it's there, and you do the same things over and over in every layer. And in some layers you do it one way and in others - other way. And then requirements change and you change the code and it still works the old way SOME of the time, but you only discover that in production, because the test case you used when you developed is handled on the layer where you changed it correctly.

And then if it's buried deep enough somebody will add another layer and fix the cases that were found - there.

And that's how the disgusting legacy code happens.

KISS, please. Unnecessary abstraction is the root of almost all problems in programming.

Micoloth · on Nov 18, 2022

That was beutifully written.

Although, I feel like very often, the idea of resucing boilerplate should __not__ to hide the boilerplate one layer below- it should be to __try not to write the boilerplate__ …

Take this very specific example at hand.. What is the meaning of this “JoelEvent” class? Why is it there?

It appears that is wraps a list of functions, with methods to push and pop from it. Why is it necessary to write them?

“Dispatch” reimplements function application , apparently? Btw, it is beyond me why one would loop over this.listeners and the check if the element is in this.listeners .

In a reasonable language or framework, I cannot in any way see a reason why this code needs to be written.. This is the idea of removing boilerplate, to me!

fennecfoxy · on Nov 18, 2022

I think the biggest problem with templating/reusing instead of boilerplate is just how hard it is for a dev to answer the question(s): has somebody already done this, is their solution flexible enough to fit mine, etc.

Hell, just helpers/utilities functions within an organisation aren't always used, devs end up reimplementing stuff all the time simply because there's no easy way to know about it (documentation is only one part of this).

yccs27 · on Nov 17, 2022

+1 on the last paragraph: Predictability means the code follows a pattern, not that it is boilerplate. Some amount of predictable code is necessary just to spell out what the code does, so that even someone unfamiliar with the pattern can simply read and understand it.

kneebonian · on Nov 17, 2022

Honestly this is what I have loved about Kotlin. There seems like there is now just a certain amount of boilerplate in every Java file and Kotlin just chose to bake that all into the language, the other instance where we see this happening is with the Lombok library in Java. Although personally I hate annotations.

lmm · on Nov 17, 2022

We don't talk in maximally compressed strings; a bit of redundancy in grammar helps make sure people can understand each other. The fact that spellcheckers are possible doesn't mean our language is too sparse and wasteful.

A lot of programming languages have a lot of room for improvement though.

WalterBright · on Nov 17, 2022

A programming language with no redundancy in it means the compiler cannot detect any errors - because every sequence of characters forms a valid program.

The skill is in selecting the optimal amount and form of redundancy.

For example, the typical ; statement terminator is redundant. People often ask, since it is redundant, why not remove it? People have tried that, and found out that the redundancy makes for far better error detection.

pharmakom · on Nov 17, 2022

Semicolon is a strange example when so many successful languages don’t have them.

I think a better example is CoffeeScript, where almost any string is a valid program but probably not the one you wanted.

WalterBright · on Nov 17, 2022

> Semicolon is a strange example when so many successful languages don’t have them.

They usually have another statement terminator instead, like a newline.

lmm · on Nov 17, 2022

On that specific example, not my experience at all. (The general point is of course valid, though I think it follows very directly from what I said)

amelius · on Nov 17, 2022

No, with redundancy your compiler can catch a certain class of errors (call them "avoidable") that doesn't exist in case of no redundancy. Of course, in both cases you still have the unavoidable errors.

ColonelPhantom · on Nov 17, 2022

I think the problem is not that "not all tokens are valid" Rather it is that we often repeat the same token sequences, and we should seek to abstract those predictable sequences into more unique tokens, e.g. by turning it into a library function.

Of course, the downside is that now you have way more tokens you need to know to understand some code, similarly to how Haskell code tends to have tons of mega-abstract function combinators or whatever whereas Go code is very simple. Which one is more readable depends on the reader, because a language like Go requires the reader to sift through more details, whereas Haskell is much terser but also requires much more pre-existing knowledge to understand.

I'm wondering if we can use these code-generation models to find "low entropy code" that is a prime target for turning into libraries.

newaccount74 · on Nov 17, 2022

I think the Dont-Repeat-Yourself mantra is a bit overused. For example, assume you have a bunch of functions that all have the following code sequence

    A()
    B()
    C()

So someone thinks, hey, lets extract that bit, and create a function that does all three things

    def ABC():
      A()
      B()
      C()

Now the calls site is much simpler and there is no more repetition:

    ABC()

Except you realise that in some call sites the B() call was missing, so to fix that you add an argument:

    def ABC(doB):
      A()
      if(doB): 
        B()
      C()

The call site now looks a bit less clean, but still no repetition!

    ABC(True)

or

    ABC(False)

Then you realise in some cases the C() call is not necessary, so just add another argument:

    def ABC(doB, doC):
       A()
       if (doB):
          B()
       if (doC):
          C()

And your call sites look like this:

    ABC(True, False)

etc.

At that point the repetition in the original is definitely preferrable...

another-dave · on Nov 17, 2022

If you put names to those functions it instantly feels a lot less like 'the original is definitely preferrable', for example something like:

    def handleNewEmail(isInboxDisplayed, areNotificationsEnabled):
      AddToInbox();
      if (isInboxDisplayed):
        RefreshInboxView();
      if (areNotificationsEnabled):
        SendSystemNotification();

Not defending needless refactoring in all cases, it's definitely a judgement call.

(Edit: formatting)

newaccount74 · on Nov 17, 2022

In your example the if statements would be needed in any case, since they depend on external state. In that case the refactoring would make sense.

I'm talking about the case where the conditionals are only needed because of the refactoring.

another-dave · on Nov 17, 2022

Ah I presumed from your example that conditionals were external to the doABC() logic since they were subsequently passed in as params?

To be honest, I find it hard to discuss readibility without a real-world example. The algebraic placeholders feel too terse.

newaccount74 · on Nov 17, 2022

I'm talking about the case where you have these two functions:

    def func1():
      ...
      A()
      B()
      C()
      ...  

    def func2():
      ...
      A()
      B()
      C()
      ...

Which is then refactored to

    def func1():
      ...
      ABC()
      ...  

    def func2():
      ...
      ABC()
      ...  

    def ABC():
      A()
      B()
      C()

Which is sensible, but then other functions show up:

    def func3()
      ...
      A()
      C()
      ...

And you refactor to:

    def func1():
      ...
      ABC(True)
      ...  

    def func2():
      ...
      ABC(True)
      ...  

    def func3()
      ...
      ABC(False)
      ...

    def ABC(doB):
      A()
      if (doB):
        B()
      C()

I'll leave the final example to your imagination, since I've already used so much screen space.

I'm currently in the process of undoing a lot of such refactorings in my codebase because it has become unmanageable.

I really think repetitive code is much easier to work with than overabstracted code.

ColonelPhantom · on Nov 21, 2022

I think the main problem is lack of expressiveness (what does True mean?). Since your examples seem to be in Python, I would solve it there with named parameters, maybe even giving them default values. That way the code may be abstract, but it is also informative.

hbrn · on Nov 18, 2022

I used to share the same example against obsession with DRY. But it's a bit more nuanced.

Both options can be valid. DRY is a tangential topic, the main goal should be to keep your code as close to your mental model as possible (also keeping mental models in sync between team members, which is quite hard).

ABC being a sequence could be a pattern, or it could be a coincidence. You can't know which one it is just by looking at the code. The knowledge whether it's a sequence or not is in the business domain and in your mental model of that domain.

You might think you're disagreeing on DRY with another team member, but in reality you two have different mental models, and one of you is using DRY to justify a his.

mejutoco · on Nov 17, 2022

Very well put. I run into this all the time in the name of "reducing complexity", when it is really hiding it under the rug.

My personal approach to combat this is better data structures that model the problem (and are checked by the compiler). Once this is in place I try to "flatten" the calls, so that things are mostly at the top level, or few levels deep, which usually comes up naturally once the data structure is consciously defined.

I try to have as much code as possible be "structure in - structure out" (pure functions) and to concentrate stateful code to work on the structure's fields/values. This is surprisingly easy once the data structures match the problem, instead of only growing organically.

lmm · on Nov 17, 2022

A codebase is a living thing. Inlining a function or splitting it into multiple cases should always be an option, and boolean flags are generally a code smell. I don't see this as an argument against DRY; when the facts change, your code structure needs to change too, but that doesn't mean your original structure was wrong.

anilakar · on Nov 17, 2022

Can't wait for the next code review.

"You seem to be using ifs and fors very much. These should be abstracted into a function."

ColonelPhantom · on Nov 18, 2022

You missed the point; an if or for is a single token. The problem is with predictable token sequences, and if you write the same for loop (or extremely similar ones) in multiple places then yes, it should be turned into a function.

danjc · on Nov 17, 2022

Effort in a codebase is unevenly distributed. The 90% of the code that looks like boilerplate probably represents 20% of the effort.

drpixie · on Nov 17, 2022

You're probably right about the effort - but I expect that such "boilerplate" code contains (or leads to) much more than 20% of the bugs.

This 90% of code is not genuine word-for-word boilerplate (copy/pasted from a known good source). This code is typically constructed fresh each time; or worse, copied from somewhere similar and quickly tweaked for names/types! (I do it, and I see it done all the time.)

I expect that the remaining 10% non-boilerplate code, taking 80% of the effort, is much more carefully considered, and less likely to contain those clumsy/forgetful off-by-one or buffer-overflow bugs.

kbenson · on Nov 17, 2022

There's a reason many languages have large overflowing repositories of modules (or there are well known libraries) that can be downloaded and used that provide boilerplate solutions for many things.

Most people don't like writing that boilerplate once they know how to do it and have done it a few times, and would rather just call a function do_that_thing_need_done_on(input1, input2).

If it can't be factored out like that and is actual language boilerplate beyond a few lines, that's a failure of the language.

If these AI models are suggesting the code that could be called in a library/module instead of the the code to actually include and call a well known and trusted library or module, I'm not sure that's progress. At least when someone notices a bug or better way to do it and updates that module or library, consumers of that module can update and benefit from it, or at a minimum see that there were bugs in the version they're running they might want to address at some point.

another-dave · on Nov 17, 2022

> If these AI models are suggesting the code that could be called in a library/module instead of the the code to actually include and call a well known and trusted library or module, I'm not sure that's progress.

I think the judgement call of when to use a library & what library to use is quite subjective, even for humans to get right.

If I'm doing JSON deserialisation it might suggest I use Gson library which would be much better than rolling your own. But the original authors are saying that you should prefer Moshi over Gson — I think it'd be hard for an AI to reach that conclusion though (though maybe not if it's doing something like tracking migrations in OS projects from Gson->Moshi).

With something a little more trivial — I don't want it to add in a dependency on left-pad, even though it has 2.5M weekly downloads so is arguably both well-known and trusted :)

You could probably set a threshold for how complex code is before it's suggested to be swapped out for a lib, but then is my code simple because I'm ignoring edge cases I should support, or because I've trimmed the fat on what I'm choosing to support (e.g. i18n, date handling, email validation etc.)

kbenson · on Nov 17, 2022

I agree it's not always cut and dry which module to use, or whether to use a module for something extremely simple (which is why I mentioned it being more than a few lines, which should weed out stuff like left-pad I would hope), but I think knowing there is a module and it's suggesting it might be a good first step.

The only thing worse than using a module that has a bug/security problem for a function that's just a few lines and not used again in the codebase is when the content of that function is copied in place instead of being included and nobody has an easy way of knowing whether that's the code that was suggested and included in their project. Worst of both worlds.

fiddlerwoaroof · on Nov 17, 2022

Yeah, one of the interesting results in empirical studies of defect rates is that defect rate is influenced by lines of code more than other factors like “static types”. Similarly, analyses of defects have discovered that they tend to occur at the end of repetitive sequences of code, because the developer has sort of switched into autopilot mode. I think the obvious conclusion here (and my experience bears this out to some extent) is that languages and libraries that force boilerplate on you produce buggier code than languages and libraries that abstract the boilerplate away.

nradov · on Nov 17, 2022

Yes and I raised the same concern when GitHub Copilot was released. If our code contains so little entropy that an AI can reliably predict the next sequence of tokens then that is evidence we are operating at too low a level of abstraction. Such tools can certainly be helpful in working with today's popular languages, but what would a future language that allows for abstracting away all that boilerplate look like?

Since this is HN I'm sure someone will say that the answer is Lisp. But can we do better?

tgv · on Nov 17, 2022

Gzipped Lisp would be an improvement in that reasoning. I’m sure you agree that’s not a very desirable way to write code.

pharmakom · on Nov 17, 2022

The number of bytes isn’t really significant. It’s the conceptual distance between what we want to achieve and the concrete steps in the code.

Lisp may be an improvement.

tgv · on Nov 17, 2022

But now you're already heading into much vaguer territory. Readability is also important. Very important, I would say. That requires easily identifiable markers for loops, conditions, functions, etc., something Lisp lacks. This might be a place where keyword coloring could be useful, but then we're relying on external help.

Another issue is consistency. Take C, Javascript, or Go. Many loops are of the form

    for (var i = 0; i < n; i++) { ... }

You could argue that "for i < n" provides the same information, but then you'd have to find a way to start the loop at a different offset, use a different end condition, or different "increment".

nradov · on Nov 17, 2022

But that's exactly the issue. In most cases with that for loop we just want to apply the same operation to every item in a collection and shouldn't need to explicitly code a loop for that. So it should be possible to take advantage of higher level language constructs to express that, or define our own constructs through some form of meta programming. Is there a way to accomplish that while still retaining readable code?

tgv · on Nov 17, 2022

Meta programming can make things worse. It can be useful for constructing/representing rule-like objects or functions, but when you start overloading basic syntactic elements, people will loose track. It was the staple trick for the obfuscated C contest, so much that it's been forbidden now, IIRC. It's really difficult to come up with something both terse, readable (and unambiguous).

But the situation is not that bad, is it? A few characters too many, so be it. I find reusability a much larger problem.

pharmakom · on Nov 17, 2022

> But now you're already heading into much vaguer territory.

Yes programming language design is a social science (imo)

GuB-42 · on Nov 17, 2022

I don't think so.

Sure, most code is boilerplate, except for that one thing, and that one thing can be anywhere. For example, let's say you want to write a function that returns the checksum of a bunch of data. That's a very common thing to do, there are plenty of libraries that do that, and I have seen the CRC32 lookup table in many places, sometimes I am the one who put it there.

Now, why rewrite such a function?

- Ignore some part of the message

- Use different constants

- Fetch data in a special way (i.e. not a file or memory buffer)

- Have some kind of a progress meter

- The library you may want to use is not available (can be for technical, legal or policy reasons)

- Some in-loop operation is needed (ex: byte swapping)

- Have a specific termination condition (ex: end-of-message marker)

- And many others, including combinations of the above

If you ignore all these points and only see the generic checksum function, yes, it is boilerplate and can be factorized. But these special cases are the reason why it may not be the case, and the reason why there are so many coding jobs.

It is also the reason why we don't have real (Lv5) self driving cars yet, why there are pilots in the cockpit, why MS Office and the like have so many "useless" features, why so many attempts to make software cleaner and simpler fail, etc...

julian37 · on Nov 17, 2022

That's hardly a convincing example. All of these points can be solved elegantly with a stream abstraction, which can be cheap or free given a sufficiently advanced language and compiler.

As for legal or policy reasons, those still aren't reasons to write boilerplate code. Your reimplementation can be tight and reuse other abstractions or include their own.

GuB-42 · on Nov 17, 2022

A stream abstraction is a solution so some (not all) of these problems, and indeed some libraries use them, but a stream abstraction that is powerful enough to solve most of these problems may result in more complex code than just rewriting the checksum algorithm from scratch. And there is a limit on how compilers can optimize, especially considering that checksum calculation may be critical to performance.

In reality, few people need to write their own checksumming function, but sometimes, it is the best thing to do. And it is just an example, there are many other instance where an off the shelf solution is not appropriate because of some detail: string manipulation, parsing, data structures (especially the "intrusive" kind), etc... And since you are probably going to have several of these in your project, it will result in a lot of boilerplate. If it was so generic not to require boilerplate, it probably has been developed already and you would be working on something else.

Abstractions are almost invariably more complex, slower, more error-prone and generally worse than the direct equivalent. They are, however, reusable, that's the entire point. So one person goes through the pain of writing a nice library, and it makes life a little easier for the thousands of people who use it, generally, that's a win. But if you write an abstraction for a single use case, it is generally worse than boilerplate.

ludovicianul · on Nov 17, 2022

This is exactly my view, especially with the web apps. If you take a distributed system, the majority of components/microservices will have more than 50% commonality in behaviour. Therefore you do mostly the same things when you start a new one. Even if code itself might be harder to generate, as even a CRUD app might have specific behaviour, testing it is definitely the same, especially when doing negative scenarios, boundary testing, CRUD operations, etc. I wrote a tool specifically for this purpose, targeted at REST APIs, aiming to automate this repetitive work and let you focus on the tests which are specific to the context.

pwinnski · on Nov 17, 2022

We are not compression algorithms! If we were, we could replace the most common block of boilerplate code with token 'A', the second-most block of boilerplate code with token 'B,' and so on, writing programs in very few bytes. God have mercy on anyone trying to debug such a program, though.

Any language with no boilerplate at all is a black box of incomprehensibility. Java has, I think, more boilerplate than average, while some other languages have less boilerplate than average.

IDEs can help with some of this, which is why I finally stopped writing all code in vim.

AnimaLibera · on Nov 17, 2022

Allowing to compress code this much is the goal of golfing languages (such as 05AB1E (or osabie) or Pyth (not Python)). The code golf stack exchange forum contains a lot of programming challenges where the goal is to write the shorest program (in bytes) that does what the challenge asks, and some answers are truly impressive, with somewhat non-trivial algorithms being implemented in as few as 4 bytes (in extreme cases). Granted, these are programming challenges and not production code to be deployed, and some golfing languages are designed for a specific kind of task or algorithm that may let us think that the algorithm was actually pre-implemented in the language (and sometimes it is kinda true), but still, worth taking a look at it.

alenmilk · on Nov 17, 2022

Yes, this is true as long as the method or function doesn't contain if's changing the behaviour depending on the data. In other words if the problem is so well defined that you can create a method that solves the problem and it doesn't need to take into account x variations of the problem then it is fine. This is the copy-paste versus creating a function discussion. Problem is the x variations of the problem and you need the code to do different things depending on the variation, we usually break the modularity of the function instead of separating the generic and non-generic parts. Hence the ifs in the function. From what I have seen people are unable to do this in their own codebase properly so I don't think it will happen globally. But on the other hand libraries are kind of the answer to the problem and as problems get well defined, one starts using libraries. Raising the level of abstraction is a continuous process.

IanCal · on Nov 17, 2022

Written language has this too. A basic lookup table of frequencies can tell you that jkkj is a typo in an English word. "Nobody else is really writing code that has the fragment in you just wrote" can find syntax errors. Better language models can find more subtle relationships.

At some point the variations mean that more abstractions don't really help.

WithinReason · on Nov 17, 2022

I think you misunderstand machine learning. "probability of each token appearing given the previous tokens" is how humans write code too: We write code based on what we want to do and what we have written before. "what I want to do" was captured in the comments added.

andai · on Nov 17, 2022

>probability of each token appearing given the previous tokens

Sounds like tokenizer -> Markov chain? Surely something trained on a TPU is more sophisticated than something we could have done in the middle of the 20th century?

pharmakom · on Nov 17, 2022

I agree. And yet most developers claim JS, Python, Java etc are totally sufficient.

wikfwikf · on Nov 17, 2022

Perl is a wonderful, innovative language which failed because it tried to remove intratextual redundancy in the way you are suggesting.

A string of length N is vastly more likely to be a valid Perl program than a valid Python program. Ultimately this meant that Perl programs, while easier to type, were much harder to read, and extremely easy to misinterpret.

ilyt · on Nov 17, 2022

There is also something to be said about nudging developer on the "right" way to do stuff.

Perl is not only hard to read because there are many shortcuts that might look like line noise for the inexperienced (hell, Rust have a bunch of those too), it's because there is a bunch of the ways to do anything.

Like take humble array

    Perl> @a
    $VAR1 = 1;
    $VAR2 = 2;
    $VAR3 = 3;
    $VAR4 = 4;
    $VAR5 = 5;

    Perl> @a + 2
    $VAR1 = 7;

Why adding a number to array results in number ? Because array in scalar context returns its length. It leads to some very compact code

    if (@a > 3) {print "big array"}

but, uh, what you do if you want to see length of string ? well length($string).

Will that also work for arrays ? Nope, if you want to force scalar context you're supposed to do scalar(@array). So how to add 2 arrays ?

   @c = (@a, @b) #

obviously. But wait, what we really typed after variable expansion is ((1,2),(2,3)) and in other languages

    irb(main):001:0> a = [1,2]
    => [1, 2]
    irb(main):002:0> b=[2,3]
    => [2, 3]
    irb(main):003:0> c = [a,b]
    => [[1, 2], [2, 3]]

    >>> a = [1,2]
    >>> b = [2,3]
    >>> c = [a,b]
    >>> c
    [[1, 2], [2, 3]]

that's exactly what we get. Confusing ? Sure. But it saves few characters!

readthenotes1 · on Nov 17, 2022

Lesson: adding comments explaining what you think the code us doing today arent helpful.

Observation: many comments explaining what the code is doing don't match what the code is doing after a few check-ins. E.G., I've seen variations on

  //Add 1 to x
  x+=2;

too many times to count.

sdenton4 · on Nov 17, 2022

Usually you can check the commit history for the unexpected line to figure out what's up, though, to let you figure out if the bug is in the code or the comment.

Comments that have the same content as the code but written in English will have code drift problems. But most comments aren't like this; they can provide context or explain what's happening at a higher level, ideally.

tetha · on Nov 17, 2022

A good rule of thumb I've heard about a few years ago: Assume you're explaining this chunk of a system to a team member standing next to you. Write that down. And don't worry about the tone feeling casual.

And once or twice at the top of a weird class/task file/section/..., don't be afraid of being a bit verbose and explain it until it's obvious, and then one more level. Stuff tends to be obvious while you have all the context uploaded in your mental caches - but a year down the line, it'll be rather confusing. Still, having such long comments too much in straight line code tends to make it harder to read.

sroussey · on Nov 17, 2022

Speaking of commit history, bad merges can leave comments and code together that shouldn’t be. And it will pass tests.

sdenton4 · on Nov 17, 2022

Curiously this seems like exactly the story of thing that code ML should be able to identify and flag. 'Hey, the comment says this, but the code is doing something totally different.'

bluocms · on Nov 17, 2022

Then the comment can be removed, as is duplicate of code and has no use.

nextaccountic · on Nov 17, 2022

However, it might be trained on code with erroneous comments, either because it was mangled by a merge or because it's outdated. The more times this happen, the more confused the AI model will be.

Which makes me to think that the AI models should be trained on the code evolution of commit chains and not just on isolated snippets of code. That way, the AI could analyze your own commits to detect when a comment becomes outdated.

ilyt · on Nov 17, 2022

I'd settle for AST-aware merge. That also fixes comments not glued to code they were attached to before

layer8 · on Nov 17, 2022

Bad merges can also leave code and code together that shouldn’t be, even if it passes tests. Always check your merges.

danuker · on Nov 17, 2022

Indeed, comments should answer the "why".

But ideally the comments should be executable, as unit tests, making you read them if and only if you break them.

For this to be a tolerable development experience, test as much as you can while keeping your tests away from slow dependencies like networking, DB, disk I/O..., and try to keep tests relevant to what you're modifying executable locally in a few seconds.

Maybe even refactor your app to have dependencies at the top, so that most code doesn't have access to them.

TeMPOraL · on Nov 17, 2022

> But ideally the comments should be executable, as unit tests

For the kinds of comments that answer the "why", if we could do that, we wouldn't need the actual code in the first place.

> making you read them if and only if you break them.

A good "why" comment is supposed to inform you beforehand, so you can make changes effectively and without introducing extra bugs in the process. Unit tests are more of a safety net.

dhosek · on Nov 17, 2022

I’m imagining the poster is thinking about things like rust doctests where code in the comments will be executed as tests when you run cargo test on the project. It’s a nice way of being able to ensure that (at least part of) the documentation will correspond to the behavior of the code.

danuker · on Nov 17, 2022

I am thinking of any fast unit test system. Going in seconds from red to green is a thrill.

TeMPOraL · on Nov 17, 2022

Unit tests suck at being documentation, and are not a good substitute for the "why" comment. They can catch some mistakes you make with the code under test, but they can't tell you why it is the way it is. At best, they can help you guess, at the cost of having to study the test code itself (which is usually much bigger than the code it tests, and often more complicated). But the thing is, the knowledge of "why" is most valuable to have before you start making changes and break some tests.

dhosek · on Nov 17, 2022

This is true, test coverage, especially in code that has to interact with other systems in particular ways will often have ten lines of setup that only matters in the test for every line of actual verification.

dhosek · on Nov 17, 2022

On the flip side, I’ve had things like:

    // in case this is malformed, fix formatting so it will still parse
    input = fixInputFormatting(input);

and had a code reviewer ask, “why are you calling fixInputFormatting”?

Nothing to raise the blood pressure like a code review question that is literally answered by a comment on the line immediately preceding where they left the question.

wizofaus · on Nov 17, 2022

But in this case surely you're better off with

    input = fixInputFormattingIfMalformed(input);

or even

    if (isMalformed(input)) input = fixInputFormatting(input);

scatters · on Nov 17, 2022

I'm with the reviewer on this one. Why is it malformed? Why are you fixing it here and not earlier? Why are you fixing it and not rejecting it? The comment tells me nothing.

dhosek · on Nov 17, 2022

I don’t remember the exact comment/cope pair anymore, but it was something that the answer they wanted was exactly what was on the preceding line. Coming up with a simple example to demonstrate that is a surprisingly hard thing to do.

marginalia_nu · on Nov 17, 2022

In general, I feel comments should explain why the code is the way it is, not what it does.

doublerabbit · on Nov 17, 2022

Exactly, understanding the mind-set knowledge of piece of code is important to keep the cogs turning.

jesse__ · on Nov 17, 2022

Agreed. Comments that explain in English exactly what the next line does drive me crazy. Even if the line is complicated. I pretty much only ever comment things these days when I change from one approach to another. ie.

// This code is weird. I tried doing it the obvious way, but that doesn't work because .. reasons ..

Sometimes, if the code is short, I'll even leave the old/obvious code there for future reference when I look at the weird code and say to myself:

"This is weird! Obviously it should work in this much simpler way.."

blowski · on Nov 17, 2022

I mostly use comments to explain business rules, and why are they implemented there.

    # High value orders need to be approved before refund   
    # Similar logic is also applied elsewhere, this is here   
    # as a failsafe.   
    if ticketValue > 500:   
      emailCustomerSupport(ticket)   
    else   
      refund

omnicognate · on Nov 17, 2022

I know you by ".. reasons .." you mean "<and I add the reasons here>", but I've seen too many comments worded exactly like that.

Some programmers use comments (correctly) to explain reasoning and context, some use them to redundantly say what the code already says and some, apparently, use them to apologise.

cperciva · on Nov 17, 2022

As a code reviewer, I love seeing comments like that, because it immediately flags "someone made a mistake here".

Sure, sometimes the code is right and the comment is wrong -- but sometimes the comment is right and the code is wrong, in which case the comment just saved me a lot of time.

nextaccountic · on Nov 17, 2022

I think it was Dijkstra that said that software actually lives in the mind of the programmer, and the code is just a distorted, lossy representation of that. Anything that gives us light on their thoughts is likely to improve our understanding of the software. Bugs happens when there's a mismatch between what's going on the mind of the programmer and what the code actually does, so when we read code a primary task is to understand what the programmer thought it should do.

Comments informs us what are the stuff that the programmer cares enough to write down. When we see a seemingly trivial comment, we may ask: why did they took time to write down that? Did they think there was any subtlety we aren't aware of? Or perhaps they were inexperienced with the language, to the point of having a hard time reading the code they themselves wrote? (if I put this comment on Google, will I find they copy-pasted from Stack Overflow? -- in this case, the comment may be very helpful, if only to track down that [0])

[0] but even better would be an IDE that highlighted code copy-pasted from Stack Overflow, Github repositories, etc

SilasX · on Nov 17, 2022

It's also possible that both the comment and the code are right, and there's some non-obvious reason why +=2 has the effect of adding 1 here, and is the only way to do it. (Not literally, as in this example, but something analogous.)

layer8 · on Nov 17, 2022

At the very least you know what the next likely alternative intended semantics is.

ryanmcbride · on Nov 17, 2022

Yeah but I've also seen things like:

AddsFiveToNum(num) {

  return num - 5

}

A bunch too. So I don't think comments are solely at fault. Self documenting code is only as good as the person who wrote it, and the people who approved it. Sometimes a comment is warranted, sometimes it's not.

sgerenser · on Nov 17, 2022

That one drives me nuts. A while back I changed some code that had a bool called “uninit” that when true, meant the value was initialized.

hoosieree · on Nov 18, 2022

Yesterday I learned that in emacs lisp, "defvar" is a definition that is set one time only and from that point on can never be changed (i.e. can not be VARied) and "defconst" is a definition that can be changed (i.e. is not CONSTant). Naming things is hard.

BobbyJo · on Nov 17, 2022

I've completely given up on comments that aren't "here is the problem this code is solving". The comments end up being actually useful that way, and longer lived in their validity. Anything more granular than that, the code itself should make obvious.

furyofantares · on Nov 17, 2022

A tool like this that tells me how surprising my code is, and takes into account comments around it, might change my mind on this front. If it always works as well as in the OP it would be super useful to be able to know how surprising the code I'm writing is (and I can then judge whether that's OK with me if it's surprising code), but this would also make it hard for the code and the comments to diverge greatly.

I mean, I still won't want "add 1 to x" comments of course.

nl · on Nov 17, 2022

Really a good code AI would note that the comments are misleading (which is sort of what this is doing).

It's actually completely achievable with today's models to look at the comment and the code immediately after it and see how surprising it is then note the comment could be incorrect.

superbatfish · on Nov 17, 2022

Some people like granular comments, and some people only want high-level comments. I’ve long suspected that both camps are correct, because they’re using different languages.

A single line of Python data analysis code is often worth 20 lines of C++. If you would be willing to add one comment per 20 lines in C++, then nearly every line of your pandas gobbledygook is worth commenting.

Terse code is good, but that doesn’t necessarily mean the comments should be terse (or absent).

ilyt · on Nov 17, 2022

Using AI to reality-check comment would be cool.

But if you're using comments to explain the code, 9/10 you just wrote it in too unreadable way.

Sure, some algorithms are complex enough that some comments are needed to explain the how (that's the 1/10) but in most cases the comments should explain why, not how. So instead it should be

  // Add the calibrated skew to compensate for latency
  x+=2;

or whatever is the reason for the code existence.

melling · on Nov 17, 2022

Lesson? That’s at least a 40 year old piece of advice. People used to try to force me to write comments, and I would explain this to them.

“Don’t get suckered in by the comments-they terribly misleading. Debug only the code. Dave Storer Cedar Rapids, Iowa

https://moss.cs.iit.edu/cs100/Bentley_BumperSticker.pdf

mannykannot · on Nov 17, 2022

That risk also exists for identifiers, which can mislead to exactly the same degree as they can inform. To avoid this hazard, run the code through an obfuscator that substitutes meaningless identifiers, before looking at it.

doublerabbit · on Nov 17, 2022

Why not both? A comment may be terribly misleading but they explain the programmers logic to a certain function. Always handy to know.

Each segment of code should have a diagram, a flow model, psuedo code, and comments. More the better.

kangalioo · on Nov 17, 2022

No, not the more the better. Comments should be easily updatable to keep up with code changes.

If every refactor involves redrawing a bunch of fancy ASCII art, you'll either get less refactors, or outdated comments

doublerabbit · on Nov 17, 2022

That's why you keep the old comments and write the changes within.

    "If loop does something" #rev1

    "If loop did do something, it now does something twice" #rev2

    "If loop doesn't do something, it does something three times" #rev3

    "we loop three times because we processing supervariables" #rev4

And you have an wrong illusion of documentation. You don't need ascii diagrams. Why not a scribble in a sketch book? Whiteboards and photography exist. And the method above doesn't require you too redraw. You've already got the first and last revision. Besides, during an documentation cycle of your projects life-cycle is where you update all documentation.

> you'll either get less refactors, or outdated comments

If so, you're not disciplined enough. If your project is to be handed over down the line, more documentation is better than any and any documentation is better than none.

IshKebab · on Nov 17, 2022

It's like the premature optimisation quote. The advice is technically sound, but it's still not really good advice because 90% of the times people quote it they're just using it as an excuse not to care about performance at all.

Or in your case not to write comments at all. That's obviously a terrible idea.

andreareina · on Nov 17, 2022

Code that I wrote, that I subsequently had to ask: Why aren't you processing the first/last element of this sequence? Why are you casting this to a list when it's already a tuple? Why are you filtering out Decimal("NaN") from tests but not float ones?

Consultant32452 · on Nov 17, 2022

I once wrote some code so convoluted I put a mickey mouse ascii art in comments and a note that said "This is some Mickey Mouse BS. I hope you never have to maintain it."

That's probably the only useful comment I've ever written.

peteradio · on Nov 17, 2022

At least with version control its feasible to track the original comment/code and understand intent. Although, I doubt an AI is able to do this yet.

_k4hw · on Nov 17, 2022

That is the work of a bad programmer. You should give as much attention to your comments as you do your code.

aledalgrande · on Nov 17, 2022

> what you think the code

I agree with that, but it gets to the point where people police all the comments in a codebase deeming them unuseful. I think, especially in a huge codebase, explaining why there is a certain block of code is very helpful to transfer knowledge.

snarf21 · on Nov 17, 2022

Agreed, I've moved to only commenting WHY the function exists. The what can be stepped through. The why gives insight into intent, purpose and origin.

robbyking · on Nov 17, 2022

I see this a lot when departments require code be commented -- instead of why you get what, which should be obvious just by reading the code.

pharmakom · on Nov 17, 2022

Reminds me of that saying to always take one compass or three.

(Although two would be useful as a signal that something has gone wrong!)

EGreg · on Nov 17, 2022

I'll go further. Comments are a code smell. They mean you've probably not modularized your code enough, or named your functions and variables descriptively enough. As I've approached 20 years of coding, I began to be able to count on one hand the times when I truly need to add a comment.

(I don't mean the DocBlock type comments for describing functions and class interfaces, that get compiled into docs.)

To all you downvoters: please do respond with examples of comments that are are counterexamples to what I said!

adgjlsfhk1 · on Nov 17, 2022

in non performance critical code, comments usually aren't necessary but they are godsends when you need to do something tricky for performance. standard examples are things like bithacks.

mherrmann · on Nov 17, 2022

I recently had a similar experience with GitHub Copilot. As I was writing a function, it correctly suggested a case I would have missed and which would have only shown up a few hours later after a lengthy CI run. I recommend people use Copilot if possible wrt. licensing concerns.

danuker · on Nov 17, 2022

We have had this same problem.

We solved it by making test suites independently runnable easily.

baby · on Nov 17, 2022

Oh my god this is awesome. Alright here's my late-to-the-party-easy-to-make-prediction: I bet that in a few years we'll have AI-based tools to find bugs in our code

tasuki · on Nov 17, 2022

The problem with "find bugs" is that the ai has to know what the program ought to do, what the desired result is.

I'd love to see a future in which we specify the constraints and the ai writes the code to fulfill them.

IshKebab · on Nov 17, 2022

No it doesn't. There are plenty of bugs that are obvious from the context. Copy and paste errors in particular.

Obviously it will work better if there's descriptions of what the code is supposed to do (as this blog post shows!) but that's true for humans too.

I'm definitely going to try this on my code.

zelphirkalt · on Nov 17, 2022

That is a possible positive outlook. Let me add some "spice" to it: AI ads. Companies making these tools and having the resources to train the models inject ads into the outputs, so that your generated code subtly contains ads or produces ads on user's screen.

eatbitseveryday · on Nov 17, 2022

Static analysis tools and runtime checkers would also be considered a form of AI, no? Those find bugs.

stevage · on Nov 17, 2022

AI is what we used to call algorithms.

tomrod · on Nov 17, 2022

Fair. AI covers the universe of decision making/recommendation algorithms (but not all algorithms). When it includes pattern recognition, we call it machine learning. When it includes uncertainty, we call it statistics (or, less common, statistical learning).

Izkata · on Nov 17, 2022

Something I heard a long time ago: "AI is the stuff we don't know how to do yet."

(Because once we do know how to do it, it gets a more specific name like "machine learning")

pontifk8r · on Nov 17, 2022

I believe Patrick Winston used to say this... (actually, "It's only AI until we know how to do it.")

Epskampie · on Nov 17, 2022

I wouldn't call a neural net an algorithm, it works distinctly different from what is usually meant by that.

101008 · on Nov 17, 2022

Can you explain why a neural net is not an algorithm? I know the definition of algorithm, but I don't much about neural nets, so I'd like to know an ELI5 explanation.

Epskampie · on Nov 17, 2022

Well, people here seem to disagree, so take this for what it's worth, but executing a neural net doesnt have a series of logical steps like an algorithm (add X to Y), but instead knowledge is implicitly stored in the link strengths of the neural network that leads to a certain output.

Since there isn't a plain sequence of steps that can be followed to explain the output, i'd say a different term is justified. Whether you call that "intelligence" is debatable.

tasuki · on Nov 17, 2022

> Since there isn't a plain sequence of steps that can be followed to explain the output

There is a sequence of steps, it's just very long and unintelligible to humans.

I'm not even sure I disagree with you: Quantity has a quality of its own.

CapsAdmin · on Nov 17, 2022

If those sequences of steps are intentionally designed I lean more towards it being algorithm. It gets a little confusing when thinking about writing a path finding algorithm that takes you from A to B using randomness to get there (trying different spatial directions)

You wrote the code that tries random directions, but you are not choosing which directions it takes when executed.

shepherdjerred · on Nov 17, 2022

It is an algorithm, but it's not what was traditionally call an algorithm because it relies on randomization and training data. Every step of the process is algorithmic according to the book defition:

> a precise rule (or set of rules) specifying how to solve some problem

The only difference is the rules are adjusted (trained) over time, rather than being written down once by a programmer.

It's a general trend in AI that:

1. Some problem in the AI-domain is solved with a new method (say, barcode scanning or handwriting recognition as historical examples).

2. This new technique is referred to as AI and not algorithmic.

3. Over time the ability of AI is pushed further.

4. At some point the method shifts from being understood as AI to being considered algorithmic.

_jvqe · on Nov 17, 2022

Is that just an algorithm with defined ranges?

It has to use/learn data within a defined scope, so even with dozens of algos, they are all well within defined ranges of variables.

Still within reason to be called well written software that performs the tasks that it was designed to accomplish.

CapsAdmin · on Nov 17, 2022

The difference for me is that the meat of an algorithm is intentionally designed whereas the meat of a neural network is not. When training a neural network there can be some intention, but it's not the meat of the resulting "algorithm".

When thinking about this it's a bit like if I submit an application on fiver to write code for me that takes me from A to Z. I get the code back, I don't understand it and I didn't write it, is it still an algorithm? All of the same can be applied to a neural network.

Is natural selection an algorithm?

Is how the universe works an algorithm?

I think it's useful in every day life to distinguish what humans do and what something out of human control does. It can also be useful to be a bit philosophical and lump definitions together, but this only works if everyone agrees that they're doing this in a discussion.

tomrod · on Nov 17, 2022

Neural nets are constructs that use algorithms to train and to predict. Back propagation is absolutely an algorithm.

IshKebab · on Nov 17, 2022

Right, there are algorithms to train and run neural nets. But the calculation that the neural net itself does can't really be described as an algorithm (unless you distort the term to lose all meaning).

tomrod · on Nov 17, 2022

They used to be called black box algorithms, if that is a salve.

Epskampie · on Nov 17, 2022

Yes, neural networks are "constructs", not algorithms, we fully agree.

tomrod · on Nov 17, 2022

Fortunately, a combination of algorithms put together in a specified way is, in of itself, an algorithm. Most are various forms of addition ;-)

stevage · on Nov 19, 2022

To be slightly more precise in the point I was making: "AI now also includes what we used to call algorithms".

wittycardio · on Nov 17, 2022

AI code tools have nothing on compilers and static analysis

danuker · on Nov 17, 2022

Fuzzers:

- https://llvm.org/docs/LibFuzzer.html#trophies

- https://lcamtuf.coredump.cx/afl/#bugs

IshKebab · on Nov 17, 2022

Maybe, but neither would have found this bug would they?

ramraj07 · on Nov 17, 2022

Ah the classic, is it really “insert in vogue method” or just a charlatan wrapping existing algos with fancy words?

Past questions: is it really compressive sensing? Is it really superresolution? Is it really expectation maximization?

My experience is it’s typically 50-50.

make3 · on Nov 17, 2022

super useful tools, but doesn't fit most people's current definition of AI no

lovasoa · on Nov 17, 2022

I think the bug in this code is the interface that it exposes. If a listener A removes another listener B during the dispatch of an event, then

- if B was added after A, it will not receive the current event

- if A was added after B, then B has already received the event by the time it is removed.

If I encountered that during a code-review, I would be very suspicious.

The right thing to do here is probably the more simple thing that the AI also seems to be suggesting :

    for (const listener of this.listeners) listener(arg);

JoelEinbinder · on Nov 17, 2022

Your method of event dispatch is very common. NodeJS does it this way, as well as many other event dispatchers. It is also how I use to do it until I encountered a very nasty bug in Chrome DevTools.

Object A emits events.

Object B subscribes to A, and manages the lifetime of Object C

Object C subscribes to A in its constructor, and unsubscribes in its dispose function.

With the common event listener model, Object C could have its methods called after dispose was called, even though it appeared to clean up its event listener!

You can check out the DOM spec for the correct behavior, which both clones the event listeners arrray and sets a removed flag on them in case they were removed after cloning.

https://dom.spec.whatwg.org/#concept-event-listener-invoke

onion2k · on Nov 17, 2022

It's quite clever, and it's fun, but "Did my listeners all get called when I dispatch after removing a listener" is a really obvious thing to unit test. The fact that the AI caught this bug highlights how good AI is getting, but it also highlights the need for very basic practises like actually testing your code.

Vinnl · on Nov 17, 2022

I'm always in favour of tools that help us find bugs earlier and with less effort, because in practice that's how we get fewer bugs, rather than just hoping everyone everywhere will be disciplined. And this seems like it could certainly be able to be in between type checking and unit tests in that sense.

onion2k · on Nov 17, 2022

I'm always in favour of tools that help us find bugs earlier and with less effort, because in practice that's how we get fewer bugs, rather than just hoping everyone everywhere will be disciplined.

I disagree. Most bugs in code are entirely valid code. They're things where the developer has written great code that does the wrong thing. Those bugs will be impossible to catch with AI until the AI can understand the requirements, and that can't happen if the requirements aren't clear, and unclear requirements are the source of the bug in the first place. That can't be solved with AI. AI can only ever be as good as the input data. In software development the input data is usually a pile of crap.

If you choose to defer to AI rather than think about the code you write then you will write buggy code, but the AI won't tell you it's buggy because it'll look fine.

The way to build high quality software is to build things with thought and rigour, with good processes like analysing requirements and building tests to cover what the requirements state the code should do.

Vinnl · on Nov 19, 2022

I don't think I argued for foregoing unit tests in favour of AI. I said that, where AI (or whatever tool, really) is able to find a bug earlier in the process than a unit test can, that seems a win to me.

lmarcos · on Nov 17, 2022

The Copilot I would like to use goes like this:

1. I write code

2. Copilot gives me good suggestions about potential improvements on my code

3. I asses the suggestions

4. Back to 1

nigma1337 · on Nov 17, 2022

I've been using SonarLint[0] for a while for this, as it not only finds code smells, but also if i do things in a weird way, like doing for(i=0;i>array.length-1;i++){console.log(arrray[i]} instead of just doing for(const element of array).

[0]: https://www.sonarsource.com/products/sonarlint/

ranguna · on Nov 17, 2022

That's how I use copilot

conscion · on Nov 17, 2022

How do you use copilot to make suggestions about refactoring existing code?

My understanding was you just give copilot stubs of code and it auto-completes.

ranguna · on Nov 19, 2022

Your point number 2 about improvements on your code: when I start writing a line of code, I already have a good idea of why I'm gonna write next. In this case it's not about copilot refactoring what I wrote, it's about refactoring what I was thinking about writing.

pr337h4m · on Nov 17, 2022

This makes me think of the Stockfish analysis on Lichess

toxik · on Nov 17, 2022

The real bug is a list being mutated during iteration, ick.

vorticalbox · on Nov 17, 2022

I can't help but think that having we'll thought out tests would have also found this issue.

itamarcode · on Nov 17, 2022

This is cool. This technique should work nicely quite often. Although, it will also generate many false-positives alerts.

At Codium.ai, we are trying to tackle this problem. We are developing a new code integrity product that intends to do something quite similar. Codium will mark problematic parts of the code (a.k.a bugs), via auto-generated tests, that isn't inline with the developer's intent. The tricky part is to have high accuracy. We don't want to annoy any folks with false positives.

We would love to get your feedback about what we are working on! We are developers with ML background, excited about exploiting ML for software development, so developers can code fast with confidence.

hamolton · on Nov 17, 2022

Has someone made a VSCode extension for this yet?

_ajoj · on Nov 17, 2022

How do you go from the output of the model to color coding the tokens?

videlov · on Nov 25, 2022

i made an open source implementation of this idea, not sure if OP did it the same way https://github.com/sturdy-dev/suspicious

ranguna · on Nov 17, 2022

In other words "can you share your tool?"

Hnrobert42 · on Nov 17, 2022

Cool! Small note. Red on black is really hard for some colorblind folks, like me. I got what you were doing though, and I could change it if it really mattered.

bluelightning2k · on Nov 17, 2022

I'd bet good money we will see GitHub CoPilot Review or something very similarly named this year

LoganDark · on Nov 17, 2022

At first I was confused about the complete lack of examples... until I enabled JavaScript and saw some code blocks appear that had not existed before.

I think this website is in dire need of a <noscript> warning somewhere.

thecleaner · on Nov 17, 2022

I think one really good use case of AI is to use it for static analysis of C code that we can't remove quickly. I am not sure if machine learning in any form has been explored in this field.

noway421 · on Nov 17, 2022

This is incredible. Sounds like a really good AI code review technique!

epigramx · on Nov 17, 2022

this is statistically meaningless, because it's a testimony of N=1. the picture might be different if you do a controlled analysis (with more bugs).

danuker · on Nov 17, 2022

Then again, software development is expensive, and few companies are willing to perform trials, let alone publish their results.

Too bad it's exactly what we need to progress the field.

peaslock · on Nov 17, 2022

How will we justify our existence unable to contribute meaningfully to the economy?

alophawen · on Nov 17, 2022

TLDR; AI found logic issue that would have been trivially found by classic TDD.

eesmith · on Nov 17, 2022

It would not have been trivial by classic TDD.

The driving case is in the comment: "Check that the listener is still there in case one is removed during dispatch".

The failing TDD test could use a this.listeners with a single listener, where that listener is not present. Or two listeners where the first is present and the second is not. (Or a list of any size where the last listeners are not present.)

That would drive the "if (!this.listeners.has(listener))" test, but not be enough to drive the choice of "continue", "break", or "return" - all of which would make that failing test go green.

How would you handle this situation in classic TDD? Indeed, one of my complaints about classic TDD is that it doesn't do enough testing away from the happy path of meeting the developer's preconceptions.

BTW, this is one of those cases where 100% statement coverage isn't enough - nor 100% branch coverage.

Mutation testing could detect it, by mutating the "break" to a "continue" and complaining when all the tests still pass. I have yet to use mutation testing in my projects.

While people argue (incorrectly IMO) that TDD naturally results in 100% statement coverage, I've never seen a TDD advocate argue that it's naturally results in code which correctly identifies all mutations.

comeonbro · on Nov 17, 2022

TLDR; steam engine moved load that could have been trivially moved by good old-fashioned labor.

TLDR; printing press copied book that could have been trivially transcribed by hand.

TLDR; farm produced food that could have been trivially foraged.

vincnetas · on Nov 17, 2022

Yes but as a perfect outcome i would say that we need as little code as possible.

Scenarios in your examples are beneficial because of increased amount of

  items moved by steam engine
  books printed by press
  food farmed

We don't need more code, we need better code.

tasuki · on Nov 17, 2022

Yes. However using AI is less effort than TDD.

wikfwikf · on Nov 17, 2022

or careful code review!

yreg · on Nov 17, 2022

yeah or just writing perfect code /s

vincnetas · on Nov 17, 2022

          oldListeners
                .stream()
                .filter(newListeners::contains)
                .forEach(l -> l.accept(args));

blueflow · on Nov 17, 2022

> I had the model look at some existing code and rank the probability of each token appearing given the previous tokens.

... is the "AI" actually just a markov chain?

eschluntz · on Nov 17, 2022

Large language models have the same interface as a Markov chain - they're just predicting likelihoods of next tokens.

You generate text by sampling from that likelihood distribution.

pigscantfly · on Nov 17, 2022

No, an autoregressive language model is conditioned on all prior states, not the previous one.

blueflow · on Nov 17, 2022

Multiply out the states, "all prior states" is then the "previous one". Easy to model as Markov chain.

The_Amp_Walrus · on Nov 17, 2022

Also 'easy' to model as a lookup table containing all possible solutions.

adgjlsfhk1 · on Nov 17, 2022

this is technically true but the Markov chain would be too big to store even with petabytes of storage.

tgv · on Nov 17, 2022

Indeed. The argument boils down to: since it's finite, I can turn it into a FSA. Not only is that unhelpful, it doesn't tell you how to construct it, i.e. the learning process.

devmor · on Nov 17, 2022

Models like this are essentially extremely efficient markov chains. Calling them AI is disingenuous, like most things we call AI lately.

bugfix-66 · on Nov 17, 2022

[flagged]

supermdguy · on Nov 17, 2022

I was curious, so I just tried using Copilot on the first 10. Here are the results:

#1- Got it after the first hint

#2- Didn't get it

#3- Didn't get it

#4- Correct first try

#5- Correct after the first hint

#6- Correct first try

#7- Correct first try

#8- Correct after the first hint

#9- Didn't get it

#10- Didn't get it

bugfix-66 · on Nov 17, 2022

From your list, it has solved simple matrix multiplication, LSD radix sort, and pointer padding, all of which appear many, many times in its training set.

I'm surprised it can fix the two prediction compressor bugs, even with a hint... That shouldn't be in the training set. But the solutions to those puzzles did appear on the front page of Hacker News a few weeks ago (https://news.ycombinator.com/item?id=33396037), so they may have been uploaded to GitHub.

Can you paste the Correct! message (as evidence of solving it) and do more than just the first 10? Just list the ones it can solve. Thanks, I appreciate it.

sillysaurusx · on Nov 17, 2022

Hey, cool website. Thanks.

(It’s fair to throw down the gauntlet like you’re doing. You’re right that it’s a nice challenge, and that AI could solve or assist with at least one of those bugs. The trouble is that very few people have access to the AI, and even fewer have the skills to write custom tooling on top of it. The author is probably the only one who could even attempt your challenge. Hopefully that will change within a couple more years.)

bugfix-66 · on Nov 17, 2022

Lots of people here have access to Microsoft Copilot or GPT3.

People with access to these models can demonstrate how the system performs on code THAT WASN'T IN THE TRAINING SET by solving a few of these puzzles.

The reality is that all (?) the amazing demonstrations involve code very similar or identical to what appeared in the training set.

fenomas · on Nov 17, 2022

"It only works well on inputs that are similar to what appeared in its training set" seems like a strange criticism to make about an ML project, no?

bugfix-66 · on Nov 17, 2022

There are people who believe this is real AI, not just aggregation and interpolation. They really believe the software understands code generally.

crummy · on Nov 17, 2022

I don't think many people here think this is true AI.

alophawen · on Nov 17, 2022

Who cares, really?

There are people who believe in god, too.

Matheus28 · on Nov 17, 2022

From my understanding, your website doesn’t actually run the user code to see if it fixes the bug. Doesn’t that mean the user also has to guess how you fixed the bug?

bugfix-66 · on Nov 17, 2022

The code you submit is compiled, linked, and executed against a suite of tests.

Any functionally correct solution is accepted.

If the site rejected your code, it's because your code was wrong (sorry).

Matheus28 · on Nov 17, 2022

Interesting, it runs so fast I thought it was a simple string match

bugfix-66 · on Nov 17, 2022

No, it's just a well-written a piece of software.

Thanks :)

EMIRELADERO · on Nov 17, 2022

The question is polite, but the intention behind it isn't. It's a bait rethoric question to throw shade at the whole thing.

bugfix-66 · on Nov 17, 2022

I think we should question this, and give it a real test.

Let's see how Copilot does on code that wasn't in the training set.

No problem, right?

EMIRELADERO · on Nov 17, 2022

Sure, but based on your previous comments and overall stance on the matter, don't be surprised when most people have the opinion I expressed about your question.

bugfix-66 · on Nov 17, 2022

Please, if you have an "artificial intelligence" that can write and understand code, I'm sure it can fix some tiny bugs in a little code that wasn't in the training set.

Why is this so contentious?

Surely the "AI" can withstand a little scrutiny.

alophawen · on Nov 17, 2022

> Why is this so contentious?

Please follow the discussion revolving copilot if you really have no clue.

fastball · on Nov 17, 2022

How did you develop these tests?

_7bxa · on Nov 17, 2022

[flagged]

blueflow · on Nov 17, 2022

I just ctrl+F this page for "markov" and: You could have replied to me directly, and no, im not over 30. It shouldn't be part of your argument.

krmboya · on Nov 17, 2022

Hmm.. what's the link between being over 30 and suggesting markov chains?

devmor · on Nov 17, 2022

There isn't one, OP is attempting to prematurely discredit anyone who scrutinizes this "AI" as out of their depth with new technologies.

c1ccccc1 · on Nov 17, 2022

If the context window is finite, then LLMs actually are Markov chains. It's just that they're a much more efficient way of representing transition probabilities than storing them all in a giant lookup table.