Hacker Newsnew | past | comments | ask | show | jobs | submit | indentit's favoriteslogin

How are people collaborating on code when using AI tools to generate patches?

We hold code review dear as a tool to make sure more than one set of eyeballs has been over a change before it goes into production, and more than one person has the context behind the code to be able to fix it in future.

As model generated code becomes the norm I’m seeing code from junior engineers that they haven’t read and possible nor do they understand. For example, one Python script calling another using exec instead of importing it as a module, or writing code that is already available as a very common part of the standard library.

In such cases, are we asking people to mark their code as auto generated? Should we review their prompts instead of the code? Should we require the prompt to code step be deterministic? Should we see their entire prompt context and not just the prompt they used to build the finished patch?

I feel like a lot of the value of code review is to bring junior engineers up to higher levels. To that extent each review feels like an end of week school test, and I’m getting handed plagiarised AI slop to mark instead of something that maps properly to what the student does or does not know.

Pair programming is another great teaching tool. Soon, it might be the only one left.


> "what's the favorite bug you've ever fixed?"

I use a variant, "What's the most memorable bug you've fixed?" - and I use it as an indicator of maturity to distinguish L3 SwE from a L5+ SwE (google levels).

First, there is the time-in-field aspect. Simply being in the field for a long time increases the amount of time you have to encounter a sleep-depriving bug.

It can show tenacity. How did they find it? What did they have to do to reproduce it? Was it in prod, test, or dev? etc.

It can show maturity. Why did it pass test? What tests were introduced to detect it? Was it a new class of bug that required new testing? Were you able to add lint rules to detect it? Did you ensure it was pushed properly to prod and do proper follow up.

It can show autonomy. Did you update the testing procedures or just post a bug and hope the QA team fixed it? Did you meet with devops and share info on how to detect and mitigate it? Did you update the playbook at least?

So many possible places to dig in to get the "hire" when the default answer is "no hire". And if you cannot find any, then that's confirmation of the default answer.


I do something similar, but instead of `insteadOf`, I just clone the repo with `gh-work:org/repo`, and in the git config:

    [includeIf "hasconfig:remote.*.url:gh-work:**/**"]
        path = ~/.gitconfig.d/gh-work.inc
So, any git repo cloned with the ssh identity defined under `gh-work` will take on the config of `gh-work.inc`, which includes the git identity, and also the same signing key as in the ssh config.

Essentially, the name `gh-work` becomes the distinguishing element in both my ssh identity and my git identity, and I find this easier to think about.


I managed reviews like this as a team lead, manager, and director at three startups. There are a lot of misconceptions from employees about the process.

It's true that managers have a lot of latitude to read self summaries and either amplify or disregard them. The #1 thing you can do to avoid problems with your own reviews is to actually understand what your manager's and the company's priorities are and align your work to them. I have given poor reviews to people who invested lots of time and energy in projects and probably even did good work on them, because they were _completely_ off strategy and completed before anyone who knew better could tell them they were a waste of time and energy.

This isn't malevolent. It's because every manager is tasked with supporting the company's overall goals, frequently with very limited resources. Work that veers off into left field, even when perceived as valuable from the employee's or peer's perspective, is basically lost opportunity to do something more valuable. And that gets very expensive when trying to grow quickly.

If you want to get ahead, you and your manager need to work together to make sure the work delivers results, is aligned with strategy, is timely, and is visible to other managers and execs. Hit all four, and the need for recognition is obvious. I've seen execs argue against managers that individuals deserve promotion. Miss one, and you're probably relying on your manager's good will and clout to make the case.

If the work is not aligned with strategy or didn't deliver results but took a lot of time, your manager will look like a fool arguing that you deserve recognition for it.

Also, re: exceeding expectations, this comes up in every org and with every team. Everyone is always graded on a curve, both within your individual team and across each exec's organization. This is because the budget for compensation is fixed ahead of time based on assumptions about the percentage of employees that will exceed expectations. As long as each exec gets roughly the expected number of employees exceeding and meeting expectations, their recommendations for promotions, bonuses, and comp adjustments will likely be approved.

If the ratio for a given exec is out of whack, the only options are: 1) Get it back in line, 2) Take budget from someone else, or 3) Increase the compensation budget.

(3) frequently can't be done without board approval, so is not really an option. (2) is going to start a knife fight between execs over whose employees deserve it more, which nobody wants. This leaves (1). This is why alignment and upward and outward visibility is so important - it banks you social capital with the people who have to allocate limited resources.


What you need is a "git rebase" that records a second parent for each commit pointing to the original commit that is being rebased.

People who prefer git rebase workflow will hate the complicated history they see in "git log", but otherwise it will be the same.

Alternatively, the right way to use "git merge" is to merge every successive commit of a branch one by one.

The problem with "git merge" is that it collapses multiple commits into one giant patch bomb.

If one of the commits caused a problem, you don't have that commit isolated on the relevant stream (the trunk) where you are actually debugging the problem.

You know that the merge introduced a problem, and it seems that it was a particular commit there. But you don't have that commit by itself in the stream where you are working.

It can easily be that a commit which worked fine on a branch only becomes a problem in its merged form on the trunk, due to some way a conflict was resolved or whatever other coincidence or situation. Then, all you know is that the giant merge bomb caused a problem, but when you switch to the branch, the problem does not reproduce and thus cannot be traced to a commit.

If that commit is individually brought into the trunk, the breakage associated with it will be correctly attributed to it.

In both cases, the source material the same: the original version of the commit doesn't exhibit the problem on its original branch.

It is pretty important to merge the individual changes one by one, so that you are changing fewer things in one commit.

People like rebase because it does that one by one thing. Git rebase breaks the relationship by not recording the extra parents, but since they have the reworked version of each change on the stream they care about, they don't care about that. Plus they like the tidy linear history.


I ask these indirectly.

    "What types of people tend to succeed and do well with your team?  What types of people tend to struggle in your team?"
(Am I going to be a culture and work/life balance fit?)

    "What are your main objectives in the next 6 to 12 months?  What's your plan to meet those objectives?"
(Do these guys have their act together and an actual plan? Is the work going to be interesting?)

    "How do you see the candidate in this role contributing to that objective?"
(Are their expectations for this role realistic? Do I fit those expectations? Do I want to be on that ride?)

    "Tell me about how the team collaborates and coordinates work"
(Am I going to be stuck in 1 hour all-hands daily "stand-ups" every day?)

> This is just as true without systemd. Auto-restarting services without monitoring will always hide problems, regardless of what’s doing the restarting.

And we should be thankful. We don't have to be aware of every single hiccup a service experiences. Even if (and specially if) you are the one responsible for keeping it up. Modern distributed systems are far too complex for us to care about every minutia.

If something failed over, and nobody noticed, is it a problem? The answer is _maybe_. How often does that happen? Is there a pattern? Is it getting more frequent? Is that happening more often than predicted?

At work, we blow up entire VMs if they fail their health checks. They can fail for many reasons, mostly uninteresting ones. And the customers don't even notice and SRE doesn't usually care. They only become relevant once there are anomalies. When you are managing thousands of instances, self-healing is required.

Error rate may not even be impacted in a significant way when those systems restart, it is often a tiny increase in a deluge of requests. So you also need observability on self-healing events to catch trends. When self-healing fails, it tends to do so pretty catastrophically.

> and alerting on set thresholds that seem aberrant.

I wish we would remove 'thresholds' from our alerting vocabulary. Very often we end up setting simple and completely arbitrary thresholds that don't actually mean much - unless it's based on SLOs and SLAs. Generally we don't know what those thresholds are supposed to be and just guess, unless it's capacity. But even for capacity: 0% disk space free is obviously bad; what people will do is set some threshold like ("alert when 80% disk is used"). Then you get page once that threshold is crossed. Is that an emergency? I don't know, maybe it took 5 years to get there. However, if you set an alert that says "Alert me if the disk will fill up in <X> days at the current rate", you can then tell if it is an emergency or not.

I suspect that you would agree with this given the use of the word "aberrant", which to me implies anomalies. But in many contexts, people think that "alert every single time CPU utilization crosses 90%" is what we are talking about.


I tend to insist that database tables are collections of global variables, and the worst kind of global variables because they're persistent (reboot won't fix them), they often affect multiple application instances at once, and application compilers can't help you verify them because they're usually in strings. Stored procedure languages have a little bit of an advantage here, but there are so many disadvantages that it's hard to recommend them.

Your best bet when writing SQL in strings is make it "greppable". Naming a table "user" is very convenient until someone asks, "What will be affected by this change to the user table?" and you're stuck with needles in a haystacks. Tossing in a tiny prefix e.g. "tblUser" will annoy some people but then your signal-to-noise is ideal.

But yeah, concatenating bits and pieces of text together into queries can rapidly escalate out of control without a conservative/judicious approach.


My pet peeve is the labeling of these things.

1)I'm going to give you an easy one to start. You see a toggle switch. It is set to on (probably - the little colour bar in the switch is coloured in).

It is labeled "Disable fnurbification".

Okay now what? Does "on" mean I'm going to be fnurbified? Does switching the switch disable the fnurbification so I actually have to switch it to "off"? No that's crazy. "on" means "disabled", cognative dissonance aside.

2) You see a toggle switch. It is set to on like before. It is labeled "Disable fnurbification".

We learned before that "on" meant "disabled", but that filled us with a vague sense of unease. For whatever reason we try toggling the switch. The text changes to just "Fnurbification"

Okay really now what? Is my fnurbification on? You try flicking the switch back. The colour fills in and the label changes to "Disable fnurbification" again. Okay what are we supposed to do?

What's happened is the designer has read a post on medium about accessibility and that screen readers don't read out the colour of the filled in part of a toggle switch, and has decided to help by changing the label when the state of the switch changes.

The problem is now the label could either be describing the current state or be describing what happens when you flip the switch. And there's really no way of knowing. I've seen this very often with the UX for boolean selectors where they use things like buttons rather than toggle switches. Does pressing the button do the thing it says on the label or does the label describe where we are now and pressing the button will reverse that? No way to be sure.

Postscript: Notice that whatever you decide is correct in the second case could change what you would do in the first case if the first type of selector is one that would change label when you toggle it.


Ask, ask, ask. Yes, you can spend days or weeks poring over the code in various ways, but asking those who made it (ideally) or those who maintain it will give you the "why". Why was it done this way? What are the implicit assumptions and invariants? Why were seemingly obvious ideas not implemented? Or were they and found problematic? Document these findings for the next generation.

But as you do this, keep an eye out for assumptions that may have changed. A feature now obsolete requiring weird code. Out-of-date assumptions about the behaviour of computers or other systems. New language features that can simplify or improve code. Talk them over with the people who know the code, and maybe you'll be the one to delete that awful code everyone hated.

Also, take notes, not just about the code, but about its environment, release process, surrounding systems, use cases, and people. Knowing who to ask about a given issue is gold.

If there are post-mortems available, they can give a great insight into how the system works and fails. Design docs to a certain extent, too, but they can be misleading especially if they are not kept up to date.

Pair programming can be a very effective way of learning, too.


Nope, Straightforward Way is like this:

    name:String, date:Date, value:Int
    "Miami", 2021-08-19 11:54:19.721376-05, 2
And making header mandatory.

I think forcing quoting of strings and forcing "," for separation and "\n" for lines. Dates are ISO, decimals use .

That is all.

P.D: This is similar how I done this for my mini-lang https://tablam.org/syntax. Tabular data is very simple and the only problem with csv is that is too flexible and let everyone to change it, but making the "schema" outside from csv is against it purpose.


My mental model always formed much more around what kind of data flows and interactions happen between components.

As a big fan of Domain Driven Design I found "domain storytelling" a useful technique when adapted to software systems.

With these diagrams it was much easier to convey complex systems and their temporal couplings. Especially to higher Management as they can actually "read" how the sequence of events flow.

Try it out and see if it can be another tool on your belt.

===

https://domainstorytelling.org


I always liked this http caching article, done in a conversational tone: https://jakearchibald.com/2016/caching-best-practices/

Really minimal template, that works well for all scenarios and doesn't cause unexpected behaviour (like changing IFS does):

    #!/usr/bin/env bash
    
    set -eEu -o pipefail
    DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
reference all relative things with dir first:

    "$DIR/relativestuff.sh"

I usually don't test the logging. However, sometimes I get tests for the logging messages "for free". That happens when I break out some logic in its own function, and I want logging on what happened in that function (definitely not all the time, but is sometimes useful). Then the function can return its result, and a list of logging messages to output. When asserting the result, I can also assert on the logging messages.

It's what I describe in "Returning a message list" in this post: https://henrikwarne.com/2020/07/23/good-logging/


I've had to explain this to non-technical stakeholders many, many times over the years, and I always use the restaurant metaphor:

If you run a commercial kitchen and you only ever cook food, because selling cooked food is your business -- if you never clean the dishes, never scrape the grill, never organize the freezer -- the health inspector will shut your shit down pretty quickly.

Software, on the other hand, doesn't have health inspectors. It has kitchen staff who become more alarmed over time at the state of the kitchen they're working in every day, and if nothing is done about it, there will come a point where the kitchen starts failing to produce edible meals.

Generally, you can either convince decision makers that cleaning the kitchen is more profitable in the long run or you can dust off your resume and get out before it burns down.


I'm ~23 years behind GP, and I feel the same way as them already. Goes to say, this may be a matter of personality.

There were few and brief moments in my career as a software developer when I was truly happy at my work. Most of those involved implementing an architecture or algorithm I figured out from scratch, or took from scientific literature - either as prototype or directly in the product. Sometimes as "hold my beer, I got this" moments. But as you can imagine, this is maybe 1% of the things I've been doing at various jobs.

From where I sit (a backend developer, thoroughly burned out by webdev a couple years ago), most of coding I do is software bureaucracy. Turn this data into that data, ensuring module X and Y get paged in the process. Oh, half of the code I'm about to write is implemented elsewhere - quick, figure out how to juggle the dependency graph to somehow route control from here to there and back. This data I want to convert is not of the right colour - oh, I need to pass it through three sets of conversion layers to get back essentially the same, but with a correct type tag on it. Etc.

It's utterly and mind-numbingly boring, unless you architectured the whole codebase yourself, at which point it's somewhat fun because it's your codebase, and who doesn't like their own Rube Goldberg machines?

At this point, I've learned a coping strategy: just forget the project scope and focus on your little plot of land. Doesn't matter that the software I wrote half of is going to help people do exciting stuff with industrial robots. What matters is that the customer changed some small and irrelevant piece of requirements for the 5th time, and I now have to route some data from the front to the back, through the other half of the code, written by my co-worker (a fine coder, btw.). So a bunch of layers of code bureaucracy I'm not familiar with, and discovering which feels like learning how to fill tax forms in a foreign country. If I start thinking about the industrial robots I'll just get depressed, so instead I focus on making the best jump through legacy code possible, so that I impress myself and my code reviewer (and hopefully make the 6th time I'm visiting this pit easier on everyone).

Maybe it's a problem of perceptions. Like in the modern military - you join because you think you'll get to fly a helicopter and shoot shoulder-mounted rockets for daily exercise. You get there and you realize it's just hard physical work, a bit of mental abuse, and a lot of doing nothing useful in particular (at least until you advance high enough or quit). And so I started coding, dreaming I'll be lording over pixels on the screens, animating machine golems, and helping rockets reach their desired orbits. Instead, I'm spending endless days pushing people to simplify the architecture, so that I can shove my data through four levels of indirection instead of six (and get the software to run 10x faster in the process), and all that to rearrange some data on the screen that really should've been just given away to people on an Excel sheet with a page of instructions attached.

(Another thing that annoys me: a lot of software I've seen, and some I've worked on, could've been better and more ergonomic as an Excel sheet with bunch of macros, and the whole reason they're a separate product instead is to silo in the data, the algorithm, and to prevent the users from being too clever with it. Also because you can't get VC funding for an Excel sheet (unless you're Palisade).)

Got a bit ranty here, sorry. I guess my point is: I accept the industry is mostly drudgework, but I refuse to accept that this is all essential complexity. Somehow, somewhere, we got off track, because all this shit is way harder than it should be.


I made a browser to deal with this [0]. I've basically abandoned work on it, but I still use it as my daily driver (granted I don't push my fresh builds up as much as I used to and there are a couple of bugs).

It provides me such a huge benefit because I just swipe my hand to destroy dozens of browser tabs. And all the while I can see the name, have them hierarchical, change their groups, etc. I go on deep GitHub or Wikipedia dives and close or collapse that entire line of thought so easily. Tabs are essentially mind maps. I wish a larger vendor with real time/money would invest in a similar many-tab model. The concept is of course directly taken from the tree-style tab extension in FF, but even that extension has suffered since the move off of XUL, and I can't easily close a ton of tabs just by dragging over the close buttons anymore (among other things).

0 - https://cretz.github.io/doogie/


indeed! really nice.

In the past I have just worked some hairy logic on a spreadsheet and just manually migrate it to code / nested conditionals.

This other article [1] shows some really cool ways to writes decision tables using structured editing UI. I wish structured editors where more commonplace!

1: https://medium.com/@markusvoelter/the-evolution-of-decision-...


I consider shellcheck absolutely essential if you're writing even a single line of Bash. I also start all my scripts with this "unofficial bash strict mode" and DIR= shortcut:

    #!/usr/bin/env bash
    
    ### Bash Environment Setup
    # http://redsymbol.net/articles/unofficial-bash-strict-mode/
    # https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html
    # set -o xtrace
    set -o errexit
    set -o errtrace
    set -o nounset
    set -o pipefail
    IFS=$'\n'

    DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
I have more tips/tricks here: https://github.com/pirate/bash-utils/blob/master/util/base.s...

This seems like a quibble, but putting the emphasis on "using an appropriate engine" in parentheses at the end is burying the lede. Sticking to regular languages will not, in itself, help - you can have catastrophic backtracking in a backtracking engine even with a regular expression that uses only "regular language" concepts.

It's also at least theoretically possible that an engine could implement a subset of the non-regular-language "regex" constructs and still avoid backtracking. We considered this in Hyperscan - there are a number of backreference cases that are tractable in practice:

First, sometimes the backreferences just don't contain that many different possibilities - sometimes they are used to match quotes (or absence of quotes) so the three alternatives are ("|'|epsilon). This can be done by expansion.

Second, a number of backreference uses are structured so that you can only have a small - fixed size - number of back-references "alive" at once. For example, /\s+(\d+)\s+\1/ in libpcre notation matches whitespace, then a number, then more whitespace, then that number again. The thing being referred to by the backref can't ever have multiple live simultaneous values - an automata could note the boundary-crossings in and out of this particular backref and make a note of the value, then do a simple compare later.

There are quite a number of backreferences out there that work this way.

No-one, to my knowledge, does this, but it could be done.

The point is - it's the engine. Not the language. Fully general support for nasty regular expression constructs does lead in practice to ugly blowouts but it doesn't have to.

Notably, received wisdom that matching backreferences is exponential is based on a false assumption. The proof that it's in NP requires you to arbitrarily increase the number of backreferences, which is counter to the way most people would imagine regex matching or string matching is analyzed - you either assume that the regex or string is a fixed workload or you give it another parameter 'm' and talk about O(nm) things.


This hits on an important point I don't see discussed much and certainly not addressed as important. Debugging is (one of) the most important UX aspects of your code.

Similar to code being read a lot more than written, you should prioritise debugger user experience over other aspects. One thing I try to do is have a temporary variable called result in many places and immediately return that on the next line. Though in some versions of Visual Studio you can faff about and add a watch with some magic name ($return Value?) to inspect the return value its much easier to have an easy location to chuck a breakpoint. Similarly chaining method calls or LINQ should be broken up with easy inspection variables so you can easily verify the state.

It seems like needless verbosity but when prod is broken and you desperately need to fix something you'll be glad of it.


You can also do a per-directory _global_ git configuration, e.g.

in .gitconfig, you say:

    [user]
        name = Me Myself
        email = personal@example.com
        signingkey = D34DB44F

    [includeIf "gitdir:~/src/github.com/work_org/"]
        path = ~/.gitconfig_work
Then in ~/.gitconfig_work:

    [user]
        name = Me Myself
        email = work@example.com
        signingkey = D34DC0D4
    
    [core]
        sshCommand = ssh -i ~/.ssh/work_ed25519
I like this way better, because I don't need to remember to specify per-project config, as long I put them in the right directory :-)

I think the point here is to not do unstructured, string-based logging. If logs are valuable, then you should treat them the same way as any other valuable data: rather than blatting string soup into some datastore and hoping you can somehow index it in the future, write a proper data schema and represent the trace of your program execution in a structured, machine-readable way.

This ends up overlapping a lot with event sourcing: if you're doing event sourcing right, with every substantial business decision recorded as an event and your logic decomposed into lots of fine-grained transformations between event streams, then the event streams can be your logs: each successive event transformation either succeeds or fails, and if it fails then you have the input event and the small piece of code in which the failure occurred, so you shouldn't need any more. If a given transformation gets too complicated to debug, split it into two smaller transformations!

You still need alerting on failures, but again, that's something that's better done in a structured way, with proper stacktraces and business-relevant information.


To make some of the ideas in this article a little more concrete, here are some research demos I’ve made:

* Legible Mathematics, an essay about the UI design of understandable arithmetic: http://glench.com/LegibleMathematics/

* FuzzySet: interactive documentation of a JS library, which has helped fix real bugs: http://glench.github.io/fuzzyset.js/ui/

* Flowsheets V2: a prototype programming environment where you see real data as you program instead of imagining it in your head: https://www.youtube.com/watch?v=y1Ca5czOY7Q

* REPLugger: a live REPL + debugger designed for getting immediate feedback when working in large programs: https://www.youtube.com/watch?v=F8p5bj01UWk

* Marilyn Maloney: an interactive explanation of a program designed so that even children could easily understand how it works: http://glench.com/MarilynMaloney/


Re: annotated screenshots, there once was a module that allowed making snapshots of GTK windows as PDFs: https://github.com/nomeata/gtk-vector-screenshot

Edit: here it is in action, with selectable text: https://www.joachim-breitner.de/various/pdf_screenshot_3.pdf


FYI, in bash, "sudo !!" re-runs the previous command with sudo.

Check out "The Problem with Promises in JavaScript" [0] for some gotchas with the design of js promises.

[0] https://dev.to/craigmichaelmartin/the-problem-with-promises-...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: