My takeaway from the article - the success of their language model illustrates what a huge fraction of our code is boilerplate.
Yes, it's helpful that the system shows bugs, but it does this, not through careful analysis of the control flow or subtle type analysis, but by "probability of each token appearing given the previous tokens".
If such a large proportion of our code follows common patterns, are we not wasting huge amounts of time writing and testing the same functions across thousands or millions of pieces of code? If we (almost) always follow a certain pattern, should not that pattern be embedded in a library or language, so vastly reducing the opportunity for errors or bugs?
It really is the exact opposite approach to static analysis, which tries to see what the code really does andhow that leads to bugs. I have had (quite expensive) static analysis tools detect genuine bugs, e.g. a somewhat subtle overflow.
What it can never detect though is correct code that misses the intention of the programmer. E.g. whether some mathematical function is accurate.
The language models try, by statistical means, to derive what should be there. Given enough data they will start to have sone (statistical) grasp on the intention.
I am not entirely sure about the boilerplate though. Often you need some minor variation of an already existing pattern. Trying to unify those slightly divergent patterns into one schema can very easily lead to very hard to understand code. Another thing is thar boilerplate is fairly easy to write and to test, because it is familiar, reducing the actual effort which goes into it. Sometimes it is just better to not reuse code.
When you go too far reducing boilerplate you get to a point where configuration becomes the code and the actual code becomes black box that people barely understand. And then they replicate what's already in the black box because they aren't sure it's there, and you do the same things over and over in every layer. And in some layers you do it one way and in others - other way. And then requirements change and you change the code and it still works the old way SOME of the time, but you only discover that in production, because the test case you used when you developed is handled on the layer where you changed it correctly.
And then if it's buried deep enough somebody will add another layer and fix the cases that were found - there.
And that's how the disgusting legacy code happens.
KISS, please. Unnecessary abstraction is the root of almost all problems in programming.
Although, I feel like very often, the idea of resucing boilerplate should __not__ to hide the boilerplate one layer below- it should be to __try not to write the boilerplate__ …
Take this very specific example at hand.. What is the meaning of this “JoelEvent” class? Why is it there?
It appears that is wraps a list of functions, with methods to push and pop from it. Why is it necessary to write them?
“Dispatch” reimplements function application , apparently? Btw, it is beyond me why one would loop over this.listeners and the check if the element is in this.listeners .
In a reasonable language or framework, I cannot in any way see a reason why this code needs to be written.. This is the idea of removing boilerplate, to me!
I think the biggest problem with templating/reusing instead of boilerplate is just how hard it is for a dev to answer the question(s): has somebody already done this, is their solution flexible enough to fit mine, etc.
Hell, just helpers/utilities functions within an organisation aren't always used, devs end up reimplementing stuff all the time simply because there's no easy way to know about it (documentation is only one part of this).
+1 on the last paragraph: Predictability means the code follows a pattern, not that it is boilerplate. Some amount of predictable code is necessary just to spell out what the code does, so that even someone unfamiliar with the pattern can simply read and understand it.
Honestly this is what I have loved about Kotlin. There seems like there is now just a certain amount of boilerplate in every Java file and Kotlin just chose to bake that all into the language, the other instance where we see this happening is with the Lombok library in Java. Although personally I hate annotations.
We don't talk in maximally compressed strings; a bit of redundancy in grammar helps make sure people can understand each other. The fact that spellcheckers are possible doesn't mean our language is too sparse and wasteful.
A lot of programming languages have a lot of room for improvement though.
A programming language with no redundancy in it means the compiler cannot detect any errors - because every sequence of characters forms a valid program.
The skill is in selecting the optimal amount and form of redundancy.
For example, the typical ; statement terminator is redundant. People often ask, since it is redundant, why not remove it? People have tried that, and found out that the redundancy makes for far better error detection.
No, with redundancy your compiler can catch a certain class of errors (call them "avoidable") that doesn't exist in case of no redundancy. Of course, in both cases you still have the unavoidable errors.
I think the problem is not that "not all tokens are valid" Rather it is that we often repeat the same token sequences, and we should seek to abstract those predictable sequences into more unique tokens, e.g. by turning it into a library function.
Of course, the downside is that now you have way more tokens you need to know to understand some code, similarly to how Haskell code tends to have tons of mega-abstract function combinators or whatever whereas Go code is very simple. Which one is more readable depends on the reader, because a language like Go requires the reader to sift through more details, whereas Haskell is much terser but also requires much more pre-existing knowledge to understand.
I'm wondering if we can use these code-generation models to find "low entropy code" that is a prime target for turning into libraries.
If you put names to those functions it instantly feels a lot less like 'the original is definitely preferrable', for example something like:
def handleNewEmail(isInboxDisplayed, areNotificationsEnabled):
AddToInbox();
if (isInboxDisplayed):
RefreshInboxView();
if (areNotificationsEnabled):
SendSystemNotification();
Not defending needless refactoring in all cases, it's definitely a judgement call.
I think the main problem is lack of expressiveness (what does True mean?). Since your examples seem to be in Python, I would solve it there with named parameters, maybe even giving them default values. That way the code may be abstract, but it is also informative.
I used to share the same example against obsession with DRY. But it's a bit more nuanced.
Both options can be valid. DRY is a tangential topic, the main goal should be to keep your code as close to your mental model as possible (also keeping mental models in sync between team members, which is quite hard).
ABC being a sequence could be a pattern, or it could be a coincidence. You can't know which one it is just by looking at the code. The knowledge whether it's a sequence or not is in the business domain and in your mental model of that domain.
You might think you're disagreeing on DRY with another team member, but in reality you two have different mental models, and one of you is using DRY to justify a his.
Very well put. I run into this all the time in the name of "reducing complexity", when it is really hiding it under the rug.
My personal approach to combat this is better data structures that model the problem (and are checked by the compiler). Once this is in place I try to "flatten" the calls, so that things are mostly at the top level, or few levels deep, which usually comes up naturally once the data structure is consciously defined.
I try to have as much code as possible be "structure in - structure out" (pure functions) and to concentrate stateful code to work on the structure's fields/values. This is surprisingly easy once the data structures match the problem, instead of only growing organically.
A codebase is a living thing. Inlining a function or splitting it into multiple cases should always be an option, and boolean flags are generally a code smell. I don't see this as an argument against DRY; when the facts change, your code structure needs to change too, but that doesn't mean your original structure was wrong.
You missed the point; an if or for is a single token. The problem is with predictable token sequences, and if you write the same for loop (or extremely similar ones) in multiple places then yes, it should be turned into a function.
You're probably right about the effort - but I expect that such "boilerplate" code contains (or leads to) much more than 20% of the bugs.
This 90% of code is not genuine word-for-word boilerplate (copy/pasted from a known good source). This code is typically constructed fresh each time; or worse, copied from somewhere similar and quickly tweaked for names/types! (I do it, and I see it done all the time.)
I expect that the remaining 10% non-boilerplate code, taking 80% of the effort, is much more carefully considered, and less likely to contain those clumsy/forgetful off-by-one or buffer-overflow bugs.
There's a reason many languages have large overflowing repositories of modules (or there are well known libraries) that can be downloaded and used that provide boilerplate solutions for many things.
Most people don't like writing that boilerplate once they know how to do it and have done it a few times, and would rather just call a function do_that_thing_need_done_on(input1, input2).
If it can't be factored out like that and is actual language boilerplate beyond a few lines, that's a failure of the language.
If these AI models are suggesting the code that could be called in a library/module instead of the the code to actually include and call a well known and trusted library or module, I'm not sure that's progress. At least when someone notices a bug or better way to do it and updates that module or library, consumers of that module can update and benefit from it, or at a minimum see that there were bugs in the version they're running they might want to address at some point.
> If these AI models are suggesting the code that could be called in a library/module instead of the the code to actually include and call a well known and trusted library or module, I'm not sure that's progress.
I think the judgement call of when to use a library & what library to use is quite subjective, even for humans to get right.
If I'm doing JSON deserialisation it might suggest I use Gson library which would be much better than rolling your own. But the original authors are saying that you should prefer Moshi over Gson — I think it'd be hard for an AI to reach that conclusion though (though maybe not if it's doing something like tracking migrations in OS projects from Gson->Moshi).
With something a little more trivial — I don't want it to add in a dependency on left-pad, even though it has 2.5M weekly downloads so is arguably both well-known and trusted :)
You could probably set a threshold for how complex code is before it's suggested to be swapped out for a lib, but then is my code simple because I'm ignoring edge cases I should support, or because I've trimmed the fat on what I'm choosing to support (e.g. i18n, date handling, email validation etc.)
I agree it's not always cut and dry which module to use, or whether to use a module for something extremely simple (which is why I mentioned it being more than a few lines, which should weed out stuff like left-pad I would hope), but I think knowing there is a module and it's suggesting it might be a good first step.
The only thing worse than using a module that has a bug/security problem for a function that's just a few lines and not used again in the codebase is when the content of that function is copied in place instead of being included and nobody has an easy way of knowing whether that's the code that was suggested and included in their project. Worst of both worlds.
Yeah, one of the interesting results in empirical studies of defect rates is that defect rate is influenced by lines of code more than other factors like “static types”. Similarly, analyses of defects have discovered that they tend to occur at the end of repetitive sequences of code, because the developer has sort of switched into autopilot mode. I think the obvious conclusion here (and my experience bears this out to some extent) is that languages and libraries that force boilerplate on you produce buggier code than languages and libraries that abstract the boilerplate away.
Yes and I raised the same concern when GitHub Copilot was released. If our code contains so little entropy that an AI can reliably predict the next sequence of tokens then that is evidence we are operating at too low a level of abstraction. Such tools can certainly be helpful in working with today's popular languages, but what would a future language that allows for abstracting away all that boilerplate look like?
Since this is HN I'm sure someone will say that the answer is Lisp. But can we do better?
But now you're already heading into much vaguer territory. Readability is also important. Very important, I would say. That requires easily identifiable markers for loops, conditions, functions, etc., something Lisp lacks. This might be a place where keyword coloring could be useful, but then we're relying on external help.
Another issue is consistency. Take C, Javascript, or Go. Many loops are of the form
for (var i = 0; i < n; i++) { ... }
You could argue that "for i < n" provides the same information, but then you'd have to find a way to start the loop at a different offset, use a different end condition, or different "increment".
But that's exactly the issue. In most cases with that for loop we just want to apply the same operation to every item in a collection and shouldn't need to explicitly code a loop for that. So it should be possible to take advantage of higher level language constructs to express that, or define our own constructs through some form of meta programming. Is there a way to accomplish that while still retaining readable code?
Meta programming can make things worse. It can be useful for constructing/representing rule-like objects or functions, but when you start overloading basic syntactic elements, people will loose track. It was the staple trick for the obfuscated C contest, so much that it's been forbidden now, IIRC. It's really difficult to come up with something both terse, readable (and unambiguous).
But the situation is not that bad, is it? A few characters too many, so be it. I find reusability a much larger problem.
Sure, most code is boilerplate, except for that one thing, and that one thing can be anywhere. For example, let's say you want to write a function that returns the checksum of a bunch of data. That's a very common thing to do, there are plenty of libraries that do that, and I have seen the CRC32 lookup table in many places, sometimes I am the one who put it there.
Now, why rewrite such a function?
- Ignore some part of the message
- Use different constants
- Fetch data in a special way (i.e. not a file or memory buffer)
- Have some kind of a progress meter
- The library you may want to use is not available (can be for technical, legal or policy reasons)
- Some in-loop operation is needed (ex: byte swapping)
- Have a specific termination condition (ex: end-of-message marker)
- And many others, including combinations of the above
If you ignore all these points and only see the generic checksum function, yes, it is boilerplate and can be factorized. But these special cases are the reason why it may not be the case, and the reason why there are so many coding jobs.
It is also the reason why we don't have real (Lv5) self driving cars yet, why there are pilots in the cockpit, why MS Office and the like have so many "useless" features, why so many attempts to make software cleaner and simpler fail, etc...
That's hardly a convincing example. All of these points can be solved elegantly with a stream abstraction, which can be cheap or free given a sufficiently advanced language and compiler.
As for legal or policy reasons, those still aren't reasons to write boilerplate code. Your reimplementation can be tight and reuse other abstractions or include their own.
A stream abstraction is a solution so some (not all) of these problems, and indeed some libraries use them, but a stream abstraction that is powerful enough to solve most of these problems may result in more complex code than just rewriting the checksum algorithm from scratch. And there is a limit on how compilers can optimize, especially considering that checksum calculation may be critical to performance.
In reality, few people need to write their own checksumming function, but sometimes, it is the best thing to do. And it is just an example, there are many other instance where an off the shelf solution is not appropriate because of some detail: string manipulation, parsing, data structures (especially the "intrusive" kind), etc... And since you are probably going to have several of these in your project, it will result in a lot of boilerplate. If it was so generic not to require boilerplate, it probably has been developed already and you would be working on something else.
Abstractions are almost invariably more complex, slower, more error-prone and generally worse than the direct equivalent. They are, however, reusable, that's the entire point. So one person goes through the pain of writing a nice library, and it makes life a little easier for the thousands of people who use it, generally, that's a win. But if you write an abstraction for a single use case, it is generally worse than boilerplate.
This is exactly my view, especially with the web apps. If you take a distributed system, the majority of components/microservices will have more than 50% commonality in behaviour. Therefore you do mostly the same things when you start a new one. Even if code itself might be harder to generate, as even a CRUD app might have specific behaviour, testing it is definitely the same, especially when doing negative scenarios, boundary testing, CRUD operations, etc. I wrote a tool specifically for this purpose, targeted at REST APIs, aiming to automate this repetitive work and let you focus on the tests which are specific to the context.
We are not compression algorithms! If we were, we could replace the most common block of boilerplate code with token 'A', the second-most block of boilerplate code with token 'B,' and so on, writing programs in very few bytes. God have mercy on anyone trying to debug such a program, though.
Any language with no boilerplate at all is a black box of incomprehensibility. Java has, I think, more boilerplate than average, while some other languages have less boilerplate than average.
IDEs can help with some of this, which is why I finally stopped writing all code in vim.
Allowing to compress code this much is the goal of golfing languages (such as 05AB1E (or osabie) or Pyth (not Python)). The code golf stack exchange forum contains a lot of programming challenges where the goal is to write the shorest program (in bytes) that does what the challenge asks, and some answers are truly impressive, with somewhat non-trivial algorithms being implemented in as few as 4 bytes (in extreme cases). Granted, these are programming challenges and not production code to be deployed, and some golfing languages are designed for a specific kind of task or algorithm that may let us think that the algorithm was actually pre-implemented in the language (and sometimes it is kinda true), but still, worth taking a look at it.
Yes, this is true as long as the method or function doesn't contain if's changing the behaviour depending on the data. In other words if the problem is so well defined that you can create a method that solves the problem and it doesn't need to take into account x variations of the problem then it is fine. This is the copy-paste versus creating a function discussion. Problem is the x variations of the problem and you need the code to do different things depending on the variation, we usually break the modularity of the function instead of separating the generic and non-generic parts. Hence the ifs in the function. From what I have seen people are unable to do this in their own codebase properly so I don't think it will happen globally. But on the other hand libraries are kind of the answer to the problem and as problems get well defined, one starts using libraries. Raising the level of abstraction is a continuous process.
Written language has this too. A basic lookup table of frequencies can tell you that jkkj is a typo in an English word. "Nobody else is really writing code that has the fragment in you just wrote" can find syntax errors. Better language models can find more subtle relationships.
At some point the variations mean that more abstractions don't really help.
I think you misunderstand machine learning. "probability of each token appearing given the previous tokens" is how humans write code too: We write code based on what we want to do and what we have written before. "what I want to do" was captured in the comments added.
>probability of each token appearing given the previous tokens
Sounds like tokenizer -> Markov chain? Surely something trained on a TPU is more sophisticated than something we could have done in the middle of the 20th century?
Perl is a wonderful, innovative language which failed because it tried to remove intratextual redundancy in the way you are suggesting.
A string of length N is vastly more likely to be a valid Perl program than a valid Python program. Ultimately this meant that Perl programs, while easier to type, were much harder to read, and extremely easy to misinterpret.
There is also something to be said about nudging developer on the "right" way to do stuff.
Perl is not only hard to read because there are many shortcuts that might look like line noise for the inexperienced (hell, Rust have a bunch of those too), it's because there is a bunch of the ways to do anything.
Usually you can check the commit history for the unexpected line to figure out what's up, though, to let you figure out if the bug is in the code or the comment.
Comments that have the same content as the code but written in English will have code drift problems. But most comments aren't like this; they can provide context or explain what's happening at a higher level, ideally.
A good rule of thumb I've heard about a few years ago: Assume you're explaining this chunk of a system to a team member standing next to you. Write that down. And don't worry about the tone feeling casual.
And once or twice at the top of a weird class/task file/section/..., don't be afraid of being a bit verbose and explain it until it's obvious, and then one more level. Stuff tends to be obvious while you have all the context uploaded in your mental caches - but a year down the line, it'll be rather confusing. Still, having such long comments too much in straight line code tends to make it harder to read.
Curiously this seems like exactly the story of thing that code ML should be able to identify and flag. 'Hey, the comment says this, but the code is doing something totally different.'
However, it might be trained on code with erroneous comments, either because it was mangled by a merge or because it's outdated. The more times this happen, the more confused the AI model will be.
Which makes me to think that the AI models should be trained on the code evolution of commit chains and not just on isolated snippets of code. That way, the AI could analyze your own commits to detect when a comment becomes outdated.
But ideally the comments should be executable, as unit tests, making you read them if and only if you break them.
For this to be a tolerable development experience, test as much as you can while keeping your tests away from slow dependencies like networking, DB, disk I/O..., and try to keep tests relevant to what you're modifying executable locally in a few seconds.
Maybe even refactor your app to have dependencies at the top, so that most code doesn't have access to them.
> But ideally the comments should be executable, as unit tests
For the kinds of comments that answer the "why", if we could do that, we wouldn't need the actual code in the first place.
> making you read them if and only if you break them.
A good "why" comment is supposed to inform you beforehand, so you can make changes effectively and without introducing extra bugs in the process. Unit tests are more of a safety net.
I’m imagining the poster is thinking about things like rust doctests where code in the comments will be executed as tests when you run cargo test on the project. It’s a nice way of being able to ensure that (at least part of) the documentation will correspond to the behavior of the code.
Unit tests suck at being documentation, and are not a good substitute for the "why" comment. They can catch some mistakes you make with the code under test, but they can't tell you why it is the way it is. At best, they can help you guess, at the cost of having to study the test code itself (which is usually much bigger than the code it tests, and often more complicated). But the thing is, the knowledge of "why" is most valuable to have before you start making changes and break some tests.
This is true, test coverage, especially in code that has to interact with other systems in particular ways will often have ten lines of setup that only matters in the test for every line of actual verification.
// in case this is malformed, fix formatting so it will still parse
input = fixInputFormatting(input);
and had a code reviewer ask, “why are you calling fixInputFormatting”?
Nothing to raise the blood pressure like a code review question that is literally answered by a comment on the line immediately preceding where they left the question.
I'm with the reviewer on this one. Why is it malformed? Why are you fixing it here and not earlier? Why are you fixing it and not rejecting it? The comment tells me nothing.
I don’t remember the exact comment/cope pair anymore, but it was something that the answer they wanted was exactly what was on the preceding line. Coming up with a simple example to demonstrate that is a surprisingly hard thing to do.
Agreed. Comments that explain in English exactly what the next line does drive me crazy. Even if the line is complicated. I pretty much only ever comment things these days when I change from one approach to another. ie.
// This code is weird. I tried doing it the obvious way, but that doesn't work because .. reasons ..
Sometimes, if the code is short, I'll even leave the old/obvious code there for future reference when I look at the weird code and say to myself:
"This is weird! Obviously it should work in this much simpler way.."
I mostly use comments to explain business rules, and why are they implemented there.
# High value orders need to be approved before refund
# Similar logic is also applied elsewhere, this is here
# as a failsafe.
if ticketValue > 500:
emailCustomerSupport(ticket)
else
refund
I know you by ".. reasons .." you mean "<and I add the reasons here>", but I've seen too many comments worded exactly like that.
Some programmers use comments (correctly) to explain reasoning and context, some use them to redundantly say what the code already says and some, apparently, use them to apologise.
As a code reviewer, I love seeing comments like that, because it immediately flags "someone made a mistake here".
Sure, sometimes the code is right and the comment is wrong -- but sometimes the comment is right and the code is wrong, in which case the comment just saved me a lot of time.
I think it was Dijkstra that said that software actually lives in the mind of the programmer, and the code is just a distorted, lossy representation of that. Anything that gives us light on their thoughts is likely to improve our understanding of the software. Bugs happens when there's a mismatch between what's going on the mind of the programmer and what the code actually does, so when we read code a primary task is to understand what the programmer thought it should do.
Comments informs us what are the stuff that the programmer cares enough to write down. When we see a seemingly trivial comment, we may ask: why did they took time to write down that? Did they think there was any subtlety we aren't aware of? Or perhaps they were inexperienced with the language, to the point of having a hard time reading the code they themselves wrote? (if I put this comment on Google, will I find they copy-pasted from Stack Overflow? -- in this case, the comment may be very helpful, if only to track down that [0])
[0] but even better would be an IDE that highlighted code copy-pasted from Stack Overflow, Github repositories, etc
It's also possible that both the comment and the code are right, and there's some non-obvious reason why +=2 has the effect of adding 1 here, and is the only way to do it. (Not literally, as in this example, but something analogous.)
A bunch too. So I don't think comments are solely at fault. Self documenting code is only as good as the person who wrote it, and the people who approved it. Sometimes a comment is warranted, sometimes it's not.
Yesterday I learned that in emacs lisp, "defvar" is a definition that is set one time only and from that point on can never be changed (i.e. can not be VARied) and "defconst" is a definition that can be changed (i.e. is not CONSTant). Naming things is hard.
I've completely given up on comments that aren't "here is the problem this code is solving". The comments end up being actually useful that way, and longer lived in their validity. Anything more granular than that, the code itself should make obvious.
A tool like this that tells me how surprising my code is, and takes into account comments around it, might change my mind on this front. If it always works as well as in the OP it would be super useful to be able to know how surprising the code I'm writing is (and I can then judge whether that's OK with me if it's surprising code), but this would also make it hard for the code and the comments to diverge greatly.
I mean, I still won't want "add 1 to x" comments of course.
Really a good code AI would note that the comments are misleading (which is sort of what this is doing).
It's actually completely achievable with today's models to look at the comment and the code immediately after it and see how surprising it is then note the comment could be incorrect.
Some people like granular comments, and some people only want high-level comments. I’ve long suspected that both camps are correct, because they’re using different languages.
A single line of Python data analysis code is often worth 20 lines of C++. If you would be willing to add one comment per 20 lines in C++, then nearly every line of your pandas gobbledygook is worth commenting.
Terse code is good, but that doesn’t necessarily mean the comments should be terse (or absent).
But if you're using comments to explain the code, 9/10 you just wrote it in too unreadable way.
Sure, some algorithms are complex enough that some comments are needed to explain the how (that's the 1/10) but in most cases the comments should explain why, not how. So instead it should be
// Add the calibrated skew to compensate for latency
x+=2;
That risk also exists for identifiers, which can mislead to exactly the same degree as they can inform. To avoid this hazard, run the code through an obfuscator that substitutes meaningless identifiers, before looking at it.
That's why you keep the old comments and write the changes within.
"If loop does something" #rev1
"If loop did do something, it now does something twice" #rev2
"If loop doesn't do something, it does something three times" #rev3
"we loop three times because we processing supervariables" #rev4
And you have an wrong illusion of documentation. You don't need ascii diagrams. Why not a scribble in a sketch book? Whiteboards and photography exist. And the method above doesn't require you too redraw. You've already got the first and last revision. Besides, during an documentation cycle of your projects life-cycle is where you update all documentation.
> you'll either get less refactors, or outdated comments
If so, you're not disciplined enough. If your project is to be handed over down the line, more documentation is better than any and any documentation is better than none.
It's like the premature optimisation quote. The advice is technically sound, but it's still not really good advice because 90% of the times people quote it they're just using it as an excuse not to care about performance at all.
Or in your case not to write comments at all. That's obviously a terrible idea.
Code that I wrote, that I subsequently had to ask: Why aren't you processing the first/last element of this sequence? Why are you casting this to a list when it's already a tuple? Why are you filtering out Decimal("NaN") from tests but not float ones?
I once wrote some code so convoluted I put a mickey mouse ascii art in comments and a note that said "This is some Mickey Mouse BS. I hope you never have to maintain it."
That's probably the only useful comment I've ever written.
I agree with that, but it gets to the point where people police all the comments in a codebase deeming them unuseful. I think, especially in a huge codebase, explaining why there is a certain block of code is very helpful to transfer knowledge.
I'll go further. Comments are a code smell. They mean you've probably not modularized your code enough, or named your functions and variables descriptively enough. As I've approached 20 years of coding, I began to be able to count on one hand the times when I truly need to add a comment.
(I don't mean the DocBlock type comments for describing functions and class interfaces, that get compiled into docs.)
To all you downvoters: please do respond with examples of comments that are are counterexamples to what I said!
in non performance critical code, comments usually aren't necessary but they are godsends when you need to do something tricky for performance. standard examples are things like bithacks.
I recently had a similar experience with GitHub Copilot. As I was writing a function, it correctly suggested a case I would have missed and which would have only shown up a few hours later after a lengthy CI run. I recommend people use Copilot if possible wrt. licensing concerns.
Oh my god this is awesome. Alright here's my late-to-the-party-easy-to-make-prediction: I bet that in a few years we'll have AI-based tools to find bugs in our code
That is a possible positive outlook. Let me add some "spice" to it: AI ads. Companies making these tools and having the resources to train the models inject ads into the outputs, so that your generated code subtly contains ads or produces ads on user's screen.
Fair. AI covers the universe of decision making/recommendation algorithms (but not all algorithms). When it includes pattern recognition, we call it machine learning. When it includes uncertainty, we call it statistics (or, less common, statistical learning).
Can you explain why a neural net is not an algorithm? I know the definition of algorithm, but I don't much about neural nets, so I'd like to know an ELI5 explanation.
Well, people here seem to disagree, so take this for what it's worth, but executing a neural net doesnt have a series of logical steps like an algorithm (add X to Y), but instead knowledge is implicitly stored in the link strengths of the neural network that leads to a certain output.
Since there isn't a plain sequence of steps that can be followed to explain the output, i'd say a different term is justified. Whether you call that "intelligence" is debatable.
If those sequences of steps are intentionally designed I lean more towards it being algorithm. It gets a little confusing when thinking about writing a path finding algorithm that takes you from A to B using randomness to get there (trying different spatial directions)
You wrote the code that tries random directions, but you are not choosing which directions it takes when executed.
It is an algorithm, but it's not what was traditionally call an algorithm because it relies on randomization and training data. Every step of the process is algorithmic according to the book defition:
> a precise rule (or set of rules) specifying how to solve some problem
The only difference is the rules are adjusted (trained) over time, rather than being written down once by a programmer.
It's a general trend in AI that:
1. Some problem in the AI-domain is solved with a new method (say, barcode scanning or handwriting recognition as historical examples).
2. This new technique is referred to as AI and not algorithmic.
3. Over time the ability of AI is pushed further.
4. At some point the method shifts from being understood as AI to being considered algorithmic.
The difference for me is that the meat of an algorithm is intentionally designed whereas the meat of a neural network is not. When training a neural network there can be some intention, but it's not the meat of the resulting "algorithm".
When thinking about this it's a bit like if I submit an application on fiver to write code for me that takes me from A to Z. I get the code back, I don't understand it and I didn't write it, is it still an algorithm? All of the same can be applied to a neural network.
Is natural selection an algorithm?
Is how the universe works an algorithm?
I think it's useful in every day life to distinguish what humans do and what something out of human control does. It can also be useful to be a bit philosophical and lump definitions together, but this only works if everyone agrees that they're doing this in a discussion.
Right, there are algorithms to train and run neural nets. But the calculation that the neural net itself does can't really be described as an algorithm (unless you distort the term to lose all meaning).
Your method of event dispatch is very common. NodeJS does it this way, as well as many other event dispatchers. It is also how I use to do it until I encountered a very nasty bug in Chrome DevTools.
Object A emits events.
Object B subscribes to A, and manages the lifetime of Object C
Object C subscribes to A in its constructor, and unsubscribes in its dispose function.
With the common event listener model, Object C could have its methods called
after dispose was called, even though it appeared to clean up its event listener!
You can check out the DOM spec for the correct behavior, which both clones
the event listeners arrray and sets a removed flag on them in case they
were removed after cloning.
It's quite clever, and it's fun, but "Did my listeners all get called when I dispatch after removing a listener" is a really obvious thing to unit test. The fact that the AI caught this bug highlights how good AI is getting, but it also highlights the need for very basic practises like actually testing your code.
I'm always in favour of tools that help us find bugs earlier and with less effort, because in practice that's how we get fewer bugs, rather than just hoping everyone everywhere will be disciplined. And this seems like it could certainly be able to be in between type checking and unit tests in that sense.
I'm always in favour of tools that help us find bugs earlier and with less effort, because in practice that's how we get fewer bugs, rather than just hoping everyone everywhere will be disciplined.
I disagree. Most bugs in code are entirely valid code. They're things where the developer has written great code that does the wrong thing. Those bugs will be impossible to catch with AI until the AI can understand the requirements, and that can't happen if the requirements aren't clear, and unclear requirements are the source of the bug in the first place. That can't be solved with AI. AI can only ever be as good as the input data. In software development the input data is usually a pile of crap.
If you choose to defer to AI rather than think about the code you write then you will write buggy code, but the AI won't tell you it's buggy because it'll look fine.
The way to build high quality software is to build things with thought and rigour, with good processes like analysing requirements and building tests to cover what the requirements state the code should do.
I don't think I argued for foregoing unit tests in favour of AI. I said that, where AI (or whatever tool, really) is able to find a bug earlier in the process than a unit test can, that seems a win to me.
I've been using SonarLint[0] for a while for this, as it not only finds code smells, but also if i do things in a weird way, like doing
for(i=0;i>array.length-1;i++){console.log(arrray[i]} instead of just doing for(const element of array).
Your point number 2 about improvements on your code: when I start writing a line of code, I already have a good idea of why I'm gonna write next. In this case it's not about copilot refactoring what I wrote, it's about refactoring what I was thinking about writing.
This is cool. This technique should work nicely quite often. Although, it will also generate many false-positives alerts.
At Codium.ai, we are trying to tackle this problem. We are developing a new code integrity product that intends to do something quite similar. Codium will mark problematic parts of the code (a.k.a bugs), via auto-generated tests, that isn't inline with the developer's intent. The tricky part is to have high accuracy. We don't want to annoy any folks with false positives.
We would love to get your feedback about what we are working on!
We are developers with ML background, excited about exploiting ML for software development, so developers can code fast with confidence.
Cool! Small note. Red on black is really hard for some colorblind folks, like me. I got what you were doing though, and I could change it if it really mattered.
I think one really good use case of AI is to use it for static analysis of C code that we can't remove quickly. I am not sure if machine learning in any form has been explored in this field.
The driving case is in the comment: "Check that the listener is still there in case one is removed during dispatch".
The failing TDD test could use a this.listeners with a single listener, where that listener is not present. Or two listeners where the first is present and the second is not. (Or a list of any size where the last listeners are not present.)
That would drive the "if (!this.listeners.has(listener))" test, but not be enough to drive the choice of "continue", "break", or "return" - all of which would make that failing test go green.
How would you handle this situation in classic TDD? Indeed, one of my complaints about classic TDD is that it doesn't do enough testing away from the happy path of meeting the developer's preconceptions.
BTW, this is one of those cases where 100% statement coverage isn't enough - nor 100% branch coverage.
Mutation testing could detect it, by mutating the "break" to a "continue" and complaining when all the tests still pass. I have yet to use mutation testing in my projects.
While people argue (incorrectly IMO) that TDD naturally results in 100% statement coverage, I've never seen a TDD advocate argue that it's naturally results in code which correctly identifies all mutations.
Indeed. The argument boils down to: since it's finite, I can turn it into a FSA. Not only is that unhelpful, it doesn't tell you how to construct it, i.e. the learning process.
From your list, it has solved simple matrix multiplication, LSD radix sort, and pointer padding, all of which appear many, many times in its training set.
I'm surprised it can fix the two prediction compressor bugs, even with a hint... That shouldn't be in the training set. But the solutions to those puzzles did appear on the front page of Hacker News a few weeks ago (https://news.ycombinator.com/item?id=33396037), so they may have been uploaded to GitHub.
Can you paste the Correct! message (as evidence of solving it) and do more than just the first 10? Just list the ones it can solve. Thanks, I appreciate it.
(It’s fair to throw down the gauntlet like you’re doing. You’re right that it’s a nice challenge, and that AI could solve or assist with at least one of those bugs. The trouble is that very few people have access to the AI, and even fewer have the skills to write custom tooling on top of it. The author is probably the only one who could even attempt your challenge. Hopefully that will change within a couple more years.)
From my understanding, your website doesn’t actually run the user code to see if it fixes the bug. Doesn’t that mean the user also has to guess how you fixed the bug?
Sure, but based on your previous comments and overall stance on the matter, don't be surprised when most people have the opinion I expressed about your question.
Please, if you have an "artificial intelligence" that can write and understand code, I'm sure it can fix some tiny bugs in a little code that wasn't in the training set.
If the context window is finite, then LLMs actually are Markov chains. It's just that they're a much more efficient way of representing transition probabilities than storing them all in a giant lookup table.
You've conflated (justified? I dont know, havent thought about it enough) concerns
about a specific model / company / tool with a whole area of research, study, and practice. On top of that, you write with an air of someone that isn't interested in a conversation, and just wants to soapboax, so I have nothing constructive left to say to you. If you ever choose to take an inquisitive approach to AI, ML, or whatever you want to call it, you'll be the first to benefit.
Intellectual property is not private property to begin with. The fair use doctrine (and constitutional right) proves that. If society's particular specific use of a work outweighs the rightsholder's monopoly on its distribution, it can be ruled as such and whoever is making that fair use can continue and won't be penalized.
He doesn't know how to, he hides behind anonymity, preferring to insult the work of others than contribute something themselves. Don't feed the trolls, I say.
Yes, it's helpful that the system shows bugs, but it does this, not through careful analysis of the control flow or subtle type analysis, but by "probability of each token appearing given the previous tokens".
If such a large proportion of our code follows common patterns, are we not wasting huge amounts of time writing and testing the same functions across thousands or millions of pieces of code? If we (almost) always follow a certain pattern, should not that pattern be embedded in a library or language, so vastly reducing the opportunity for errors or bugs?