I can see how this is handy, but it's also dangerous and likely to bite you in t...

rmilk · on April 15, 2022

The technique I found works well is to edit the file in vim and use the !G (process lines 1-N in shell) (or use emacs in a similar way). Gives you infinite undo and redo until you get the commands right. Then you can view the history and make a shell script like this using sponge. For example, edit a file, go to line 1 and type: !Gsort File is run though sort and results replace the buffer. To undo, use ‘u’, to redo, use CTRL-r.

feanaro · on April 16, 2022

Huh, never heard of `!G`. How is it different from `%!sort`?

nerdponx · on April 16, 2022

! is also a normal mode command/operator. It accepts a motion and then drops you into the command line with :{range}! pre-filled, where {range} is the range of lines covered by the motion. !G in normal mode is exactly equivalent to :.,$!

feanaro · on April 18, 2022

Oh, wow! That's so neat. Thank you for explaining. It's curious that I missed this for so long.

nerdponx · on April 19, 2022

It's OK, I've been a "serious" Vim user for ~7 years and I just learned about it this year. It's such an enormous program with so much functionality that is hard to fault somebody for missing any individual piece of it.

What I find weird is that there's no analogous normal mode operator command for dropping into a command line without the ! prefix. It's easy enough to write your own (good excuse to learn about "operator pending mode"), but I often find myself scratching my head at what made it into the builtin commands and what did not.

amelius · on April 15, 2022

Why are we still using filesystems that don't have an "undo" operation, though?

Is the work of programmers less valuable than, say, the work of Google Docs users (where there is an undo operation)?

Beltalowda · on April 16, 2022

That's kind of how I use git. I would never use "sponge" or "sed -i" outside of a git repo or with files that haven't been checked in already.

I agree it would be nice to have this in the filesystem; some filesystems support this (e.g. NILFS[1]), but none of the current "mainstream" ones do AFAIK. In the meanwhile, git works well enough.

[1]: https://nilfs.sourceforge.io/en/

lupire · on April 15, 2022

Which systems don't have snapshotting?

Mac has Time Machine.

amelius · on April 16, 2022

Does it take a snapshot after every shell command?

grosswait · on April 16, 2022

This is not a file system I would be interested in. If you’ve ever snooped in on fs activity it is constant and overwhelming on even an average system. IDEs can have undo, vi and emacs have undo. As others in the thread have said, just use multiple files.

Personally I’d be interested in a shell having undo capability, but not a file system.

Dylan16807 · on April 16, 2022

> This is not a file system I would be interested in. If you’ve ever snooped in on fs activity it is constant and overwhelming on even an average system.

I'm not sure how these sentences are connected. Are you implying that allowing undo would make those problems significantly worse? I'm not sure of that. If you have a CoW filesystem, which you probably want for other reasons, then having a continuous snapshot mode for recent activity would not need much overhead.

If you're saying there's too much activity to allow an undo, well, I assume the undo would be scoped to specific files or directories!

suction · on April 16, 2022

And of course, no reply

catlifeonmars · on April 16, 2022

Snapshotting is different. Undo/redo is not a contingency plan; instead it’s something that you make a regular part of your workflow.

justsomehnguy · on April 15, 2022

Agree.

Sure there could be some situations where it could be handy, like in some auomated scenarios, but most of the time it is not a big deal to write

    foo data1 data2
    bar data2

uuyi · on April 15, 2022

Been there done that. Good advice.

In my case I could recreate the original file but it took 90 minutes to scrape the remote API to do it again...

HellsMaddy · on April 15, 2022

Right. You should always test the command first. If the data is critical, use a temporary file instead. I usually use this in scripts so I don’t have to deal with cleanup.

sedatk · on April 15, 2022

> If the data is critical, use a temporary file instead

Use a temporary file always. Sponge process may be interrupted, and you end up with a half-complete /etc/passwd in return.

rcoveson · on April 15, 2022

Couldn't `mv` or `cp` from the temp file to `/etc/passwd` be interrupted as well? I think the only way to do it atomically is a temporary file on the same filesystem as `/etc`, followed by a rename. On most systems `/tmp` will be a different filesystem from `/etc`.

AdamJacobMuller · on April 15, 2022

mv can't, or, more correctly the rename system call can not.

rename is an atomic operation from any modern filesystem's perspective, you're not writing new data, you're simply changing the name of the existing file, it either succeeds or fails.

Keep in mind that if you're doing this, mv (the command line tool) as opposed to the `rename` system call, falls back to copying if the source and destination files are on different filesystems since you can not really mv a file across filesystems!

In order to have truly atomic writes you need to:

open a new file on the same filesystem as your destination file

write contents

call fsync

call rename

call sync (if you care about the file rename itself never being reverted).

This is some very naive golang code (from when I barely knew golang) for doing this which has been running in production since I wrote it without a single issue: https://github.com/AdamJacobMuller/atomicxt/blob/master/file...

JJMcJ · on April 15, 2022

Not clear on need for fsync and sync.

Are those for networked like NTFS or just as security against crashes.

Logically on a single system there would be no effect assuming error free filesystem operation. Unless I'm missing something.

AnssiH · on April 15, 2022

Without the fsync() before rename(), on system crash, you can end up with the rename having been executed but the data of the new file not yet written to stable storage, losing the data.

ext4 on Linux (since 2009) special-cases rename() when overwriting an existing file so that it works safely even without fsync() (https://lwn.net/Articles/322823/), but that is not guaranteed by all other implementations and filesystems.

The sync() at the end is indeed not needed for the atomicity, it just allows you to know that after its completion the rename will not "roll back" anymore on a crash. IIRC you can also use fsync() on the parent directory to achieve this, avoiding sync() / syncfs().

AdamJacobMuller · on April 16, 2022

> ext4 on Linux (since 2009) special-cases rename

This is interesting.

The linked git entry (https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.g...) from the LWN article says "Notice: this object is not reachable from any branch."

Did this never get merged because I definitely saw this issue in production well after 2009.

I guess it either got changed, or, a different patch applied but perhaps this https://github.com/torvalds/linux/blob/master/fs/ext4/namei.... does it?

AnssiH · on April 16, 2022

The patch just got rebased, here's the one that was actually applied in master for v2.6.30: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.g...

And yes, the code you highlighted is exactly this special-case in its current form. The mount option "noauto_da_alloc" can be used to disable these software-not-calling-fsync safety features.

rcoveson · on April 15, 2022

I'd like to know why as well. The inclusion of the fsync before the rename implies to me that the filesystem isn't expected to preserve order between write and rename. It could commit a rename before committing _past_ writes, which could leave your /etc/passwd broken after an outage at a certain time. I can't tell whether that's the case or not from cursory googling (everybody just talks about read-after-write consistency). Maybe it varies by filesystem?

The final sync is just there for durability, not atomicity, like you say.

AdamJacobMuller · on April 16, 2022

> the filesystem isn't expected to preserve order between write and rename

Correct.

The rename can succeed while the write of the file you just renamed gets rolled back.

sedatk · on April 15, 2022

You can use `/etc/passwd.new` as a temporary file to avoid the problems you mentioned. In the worst case, you'll have an orphaned passwd.new file, but /etc/passwd is guaranteed to remain intact.

einpoklum · on April 16, 2022

Probably not. If it's implemented responsibly, it will internally:

1. Write to a temporary file 2. Do the equivalent of mv tmpfile originalfile

so it will either succeed or do nothing

sedatk · on April 16, 2022

"Responsibly" is subjective here. I could argue that responsible thing to do is to use as little resources as possible, and in that case, directly overwriting the file would be the "responsible" thing to do.

einpoklum · on April 17, 2022

> I could argue that responsible thing to do is to use as little resources as possible

No, you couldn't, because a sponge is intentionally using more resources: It soaks up as much want as it can. And the program is intended to soak up all of the output. Otherwise it would be `cat`.

sedatk · on April 18, 2022

Your example proves my point: what's responsible is subjective. It's meaningless to talk about doing the "responsible" thing.

Quekid5 · on April 15, 2022

This is why I usually just use a temporary directory and do a quick

    git init .
    git add .
    git commit -m "wip"

... and proceed from there. So many ways to screw up ad hoc data processing using shell and the above can be a life saver. (Along with committing along the way, ofc.)

EDIT: Doesn't work if you have huuuuge files, obviously... but you should perhaps be using different tools for that anyway.

chungy · on April 15, 2022

You might like to try using src, a simple single-file VC: http://www.catb.org/esr/src/

ElectricalUnion · on April 16, 2022

I guess if you want something single-file that resembles git (now thinking better, not sure if a requirement at all), you can also try Fossil ( https://www2.fossil-scm.org ).

wombatpm · on April 16, 2022

This is why I read HN. You never know when a brilliant idea will appear. Thank you. I never thought of doing this for temporary work

antihero · on April 15, 2022

Why not write it to a different file

dredmorbius · on April 15, 2022

  cmd < somefile | somefile.tmp && mv somefile.tmp somefile

Will read from somefile, and only replace the source if (and when) the pipleline exits successfully.

Mind that this may still bite in interesting ways. But less frequently.

You can also invoke tempfile(1) which is strongly recommended in scripts.

https://www.unix.com/man-page/Linux/1/tempfile/

HellsMaddy · on April 15, 2022

I was wondering what the difference between tempfile and mktemp was. At the bottom of the tempfile man page, it says:

> tempfile is deprecated; you should use mktemp(1) instead.

dredmorbius · on April 16, 2022

I ... need to revisit that. Though I suspect you're right.

suction · on April 16, 2022

How do you know you've tested all cases?