There's a great opportunity to not just reimplement the GNU coreutils but to act...

est31 · on March 9, 2021

The GNU coreutils are the lowest common denominator of most GNU systems, and thus a lot of software targets them including the GNU extensions.

Replacements like rg or fd are often not even compatible to the unix originals let alone the GNU extensions. Yes, some of the GNU tools are badly in need of UI improvements, but you'd have to abandon compatibility with a great deal of scripts.

I think both a coreutils rewrite as well as some end user facing software like rg has its place on modern unix systems. I'm a very happy user of rg! But I'd like some more respect for tradition by some of those tools. For example, "fd" in a unix setting refers to file descriptors. They should rename IMO.

alerighi · on March 9, 2021

Another thing to mention: there is not only the compatibility at the side of software (scripts), that to me is easy to maintain (just keep the old software with the new compatible one and you are done).

The most important thing is compatibility with humans. The main reason because I tend to use a pretty standard setup is because I know that these tools are the standard, I need to ssh into a server to resolve some issue, well I have a system that I'm familiar with.

If for example on my system I have fancy tools then I would stop using the classic UNIX tools, and then every time I would have to connect to another system (quite often) I would type wrong commands because they are different or totally not present or have to install the new fancy tools on the new system (and it's not always the case, for example if we talk about embedded devices with 8Mb of flash everything more than busybox is not possible).

To me the GNU tools have the same problem, they got you used to non POSIX standard things (like putting flags after positional arguments) that hits you when you have to do some work on systems without them. And yes, there still exist system without the GNU tools, MacOS for example, or embedded devices, some old UNIX server that is still running, etc.

Last thing, if we need to look at the future... better change everything. Let's be honest, if I could chose to switch entirely on a new shell and tools, and magically have it installed on every system that I use, it would probably be PowerShell. For the simple reason that we are no longer in the '80 where everything was a stream of text, and a shell that can manipulate complex objects is fantastic.

pdimitar · on March 9, 2021

> Last thing, if we need to look at the future... better change everything.

Absolutely. We had decades to work with a fairly stable set of tools and they are not going anywhere. Whoever needs them, they are there and likely will be for several more decades.

I am gradually migrating all my everyday DevOps workflows (I am a senior programmer and being good with tooling is practically mandatory for my work) to various Rust tools: ls => exa, find => fd, grep => rg, and more and more are joining each month. I am very happy about it! They are usually faster and work more predictably and (AFAICT) have no hidden surprises depending on the OS you use them in.

> we are no longer in the '80 where everything was a stream of text, and a shell that can manipulate complex objects is fantastic.

Absolutely (again). We need a modern cross-platform shell that does that. There are few interesting projects out there but so far it seems that the community is unwilling to adopt them.

I am personally also guilty of this: I like zsh fine even though the scripting language is veeeeery far from what I'd want to use in a shell.

Not sure how that particular innovation will explode but IMO something has to give at one point. Pretending everything is a text and wasting trillions of CPU hours in constantly parsing stuff is just irresponsible in so many ways (ecological included).

nine_k · on March 10, 2021

I sometimes think that the original Unix shell, when run on a PDP-11, was a paragon of waste, with CPU being so slow, and RAM so scarce. It was like running Ruby on an ESP 32. Still it was too useful to ignore.

I suspect that the endless parsing may be even economical compared to the complex dance needed e.g. to call a Python method. The former is at least cache-friendly and can more easily be pipelined.

Though it's ok to be slow for a tool which sees mostly interactive use, and the scripting glue use. Flexibility and ergonomics trump the considerations of computational efficiency. So I expect that the next shell will burn more cycles in a typical script. But it will tax the human less with the need to inventively apply grep, cut, head, tail, etc, with their options and escaping / quoting rules.

pdimitar · on March 11, 2021

> I sometimes think that the original Unix shell, when run on a PDP-11, was a paragon of waste, with CPU being so slow, and RAM so scarce. It was like running Ruby on an ESP 32. Still it was too useful to ignore.

Sure, but it was a different time. People there were down for anything that actually worked and improved the situation. Like the first assembler written in machine code, then the first C compiler being written in assembly, etc. People needed to bootstrap the stack somehow.

Nowadays we have dozens, maybe even thousands of potential entry points, yet we stubbornly hold on to the same inefficient old stuff. How many of us do REALLY need a complete POSIX compliance for our everyday work? Yeah, the DevOps team might need that. But do most devs need that? Hell no. So why isn't everyone trying stuff like `nu` or `oli` shell etc.? They are actually a pleasure to work with. Answer: network effects, of course. Is that the final verdict? "Whatever worked in the 70s shall be used forever, with all of its imperfections, no matter how unproductive it makes the dev teams".

Is that the best the humanity can do? I think not... yet we are scarcely moving forward.

> I suspect that the endless parsing may be even economical compared to the complex dance needed e.g. to call a Python method. The former is at least cache-friendly and can more easily be pipelined.

50/50. You do have a point but a lot of modern tools written in Rust demonstrate how awfully inefficient some of these old tools are. `fd` in particular is times faster than `find`. And those aren't even CPU-bound operations; just parallel I/O.

Another example closer to yours might be that I knew people who replaced Python with OCaml and are extremely happy with their choice. Both languages have no (big) pretense that they can do parallel work very well so nothing much is lost by migrating away from Python [for various quick scripting needs]. OCaml however is strongly typed and MUCH MORE TERSE than Python, plus it compiles lightning-fast and runs faster than Golang (but a bit slower than Rust, although not by much).

> Though it's ok to be slow for a tool which sees mostly interactive use, and the scripting glue use.

Maybe I am an idealist but I say -- why not both be fast and interactive? `fzf` is a good demonstration that both concepts can coexist.

> Flexibility and ergonomics trump the considerations of computational efficiency.

Agreed! The way I see it nowadays though, is that many tools are BOTH computationally inefficient AND not-ergonomic.

But there also reverse examples like GIT: very computationally efficient but it's still a huge hole of WTFs for most devs (me included).

Many would argue that the modern Rust tools are computationally less efficient because they spawn at least N threads (N == CPU threads) but the productivity gain earned from that more aggressive use of machine resources is IMO worth it (another close example is the edit -> save -> recompile -> test -> edit... development cycle; the faster that is, the bigger the chances that the dev will follow their train of thought until the end and will get the job done quicker).

---

So TL;DR:

- We can do better

- We already are doing better but the new tools remain niche

- Old network effects are too strong and we must shake them off somehow

- We are holding on to old paradigmae for reasons that scarcely have anything to do with the programming job itself

Cloudef · on March 10, 2021

>For the simple reason that we are no longer in the '80 where everything was a stream of text, and a shell that can manipulate complex objects is fantastic.

Stream of bytes is the only sane thing to do. Nothing keeps you from not having a flag in your cli programs to choose the input/output format. In fact many programs already has this and json seems pretty popular.

Having standard serialization is just gonna be boilerplate and unneccessary for many programs. Having user choosing how to interpret the input/output is the best way.

dexterlemmer · on March 20, 2021

> Having standard serialization is just gonna be boilerplate and unneccessary for many programs.

And for a lot of programs having to keep generating and parsing strings is just a bunch of unnecessary boilerplate.

> Stream of bytes is the only sane thing to do. Nothing keeps you from not having a flag in your cli programs to choose the input/output format. In fact many programs already has this and json seems pretty popular.

Having an API like this is a great idea... as a thin wrapper on top of a tool with a standardized serde protocol or binary file format. The different API's can be exposed as separate tools or as separate parts of the API of a single tool from a user's POV.

Furthermore: JSON is not just a stream of bytes nor text, neither is CSV, nor any other text format. Handling text as properly typed objects makes a lot of sense.

I have little experience with shells that can work with objects like Powershell. But I'e seen screenshots of new objects-based shell developed in Rust some time ago and it was early days still so I didn't actually even try it out, but looked downright great compared to the current stream of text-based shells of today in terms of ergonomics and capabilities.

adrianN · on March 9, 2021

Not being compatible is often the point of tools like ripgrep.

thesuperbigfrog · on March 9, 2021

If a Rust rewrite of Coreutils is not backward compatible, it is not a replacement and the current GNU Coreutils would still be needed for backward compatibility.

There is a long history of newer, better tools supporting backward compatibility to be a replacement for their predecessors:

- bash and zsh support backward compatibility to be replacements for sh.

- vim has compatibility mode to be a replacement for vi.

If the Rust implementation is not backward compatible, it should not be called "Coreutils".

anand-bala · on March 9, 2021

The Rust rewrite of Coreutils (this post essentially) _is_ meant to be a drop in replacement for GNU/Coreutils. OTOH, ripgrep and fd are not "drop-in replacements". Rather, they are "replacements" that devs can use if they want to.

GoblinSlayer · on March 9, 2021

[flagged]

thesuperbigfrog · on March 9, 2021

The Rust implementation that we are discussing is MIT licensed and as far as I know not part of GNU.

I do not believe that "Coreutils" is trademarked by GNU.

There are other projects that use the name "Coreutils" that are not part of GNU:

https://github.com/DiegoMagdaleno/BSDCoreUtils

GoblinSlayer · on March 10, 2021

>I do not believe that "Coreutils" is trademarked by GNU.

That's exactly why GNU sort of coreutils should be called GNU/Coreutils. The comment said that GNU/Coreutils are equivalent to Coreutils, I said they aren't.

entropicdrifter · on March 9, 2021

I think the person you're replying to was just memeing on the Richard Stallman essay that has the line, "What you call Linux should really be called GNU/Linux"

audience_mem · on March 11, 2021

I was sad to find out that rms apparently didn't even say it.

steerablesafe · on March 9, 2021

It's not the point of ripgrep. If ripgrep were only marginally better at common grepping tasks then it could hardly justify not being compatible. It aims to be much better though, so it can justify it well. But not being compatible is not the point or the goal of the project, I believe.

burntsushi · on March 9, 2021

Yes, this is correct in some sense. I didn't set out to create an "incompatible grep." I set out to build a "better" grep, where "better" could include things like being incompatible.

Of course, ripgrep is still ultimately a lot more similar to grep than it is different. Where possible, I used the same flag names and prescribed similar behavior---if it made sense. Because it's good to lean on existing experiences. It makes it easier for folks to migrate.

So it's often a balancing act. But yes, POSIX compatibility is certainly a non-goal of ripgrep.

coldtea · on March 9, 2021

That said, thanks for making ripgrep so good. I use it 100s of times every day for 3-4 years now... (and iirc, it's the default engine in VS Code as well).

orra · on March 9, 2021

Not having to specify "-rHInE" on every invocation is one killer reason to use ripgrep, over grep. It's a concrete advantage of you not having 100% compatibility as a goal.

(The other killer reason is the blitzing speed.)

Sebb767 · on March 9, 2021

If that is the only incompatibility, it would be easy to make a patch that checks if it is called as "grep" and default to "not -rHInE" - so one could ship ripgrep as default and yet have backwards compatibility. Some busy boxes already do that iirc.

EDIT: So I've quickly looked into it and it seems nobody did an extensive comparison to the grep feature set or the POSIX specification. If I have some time later this week I might do this and check whether something like this would be viable.

burntsushi · on March 9, 2021

It's not even close to the only incompatibility. :-) That's a nominal one. If that were really the only thing, then sure, I would provide a way to make ripgrep work in a POSIX compatible manner.

There are lots of incompatibilities. The regex engine itself is probably one of the hardest to fix. The incompatibilities range from surface level syntactic differences all the way down to how match regions themselves are determined, or even the feature sets themselves (BREs for example allow the use of backreferences).

Then of course there's locale support. ripgrep takes a more "modern" approach: it ignores locale support and instead just provides what level 1 of UTS#18 specifies. (Unicode aware case insensitive matches, Unicode aware character classes, lots of Unicode properties available via \p{..}, and so on.)

Sebb767 · on March 9, 2021

Pity! I did look; only "-E" and "-s" diverge from the POSIX standard parameter-wise. But making significant changes to the pattern engine is probably not worth it.

Thanks anyway, I enjoy rg quite a lot :)

nicoburns · on March 9, 2021

It's worth noting that the implementation of ripgrep has been split up into a whole bunch of modular components. So it wouldn't be out of the question for someone to piece those together into a GNU-compatible grep implementation.

orra · on March 9, 2021

True, though is there any point? ripgrep's homegrown regex engine only supports true regular expressions.

To give backreference support, ripgrep can optionally use PCRE. But PCRE already comes with its own drop in grep replacement...

burntsushi · on March 9, 2021

To the extent that you want to get a POSIX compatible regex engine working with ripgrep, you could patch it to use a POSIX compatible regex engine. The simplest way might be to implement the requisite interfaces by using, say, the regex engine that gets shipped with libc. This might end up being quite slow, but it is very doable.

But still, that only solves the incompatibilities with the regex engine. There are many others. The extent to which ripgrep is compatible with POSIX grep is that I used flag names similar to GNU grep where I could. I have never taken a fine toothed comb over the POSIX grep spec and tried to emulate the parts that I thought were reasonable. Since POSIX wasn't and won't be a part of ripgrep's core design, it's likely there are many other things that are incompatible.

A POSIX grep can theoretically be built with a pretty small amount of code. Check out busybox's grep implementation, for example.

While building a POSIX grep in Rust sounds like fun, I do think you'd have a difficult time with adoption. GNU grep isn't a great source of critical CVEs, it works pretty well as-is and is actively maintained. So there just isn't a lot of reason to. Driving adoption is much easier when you can offer something new to users, and in order to do that, you either need to break with POSIX or make it completely opt-in. (I do think a good reason to build a POSIX grep in Rust is if you want to provide a complete user-land for an OS in Rust, perhaps if only for development purposes.)

Sebb767 · on March 9, 2021

Well, the reasons I see for being POSIX-compatible would be:

1. Distributions could adopt rg as default and ship with it only, adding features at nearly no cost

2. The performance advantage over "traditional" grep

Number 1 is basically how bash became the default; since it is a superset of sh (or close enough at least), distributions could offer the feature set at no disadvantage. Shipping it by default would allow scripts on that distribution to take advantage of rg and, arguably, improve the situation for most users at no cost.

If one builds two programs in one with a switch, you're effectively shipping optional software, but in a single binary, which makes point 1 pretty moot. If you then also fall back on another engine, point 2 is moot as well - so the only point where this would actually be useful is if rg could become a good enough superset of grep that it would provide sufficient advatages (most greps _already_ provide a larger superset of POSIX, though). Everything else would just add unnecessary complexity, in my opinion.

But it would have been nice :)

burntsushi · on March 9, 2021

Ah I see. Yeah, that's a good point. But it's a very very steep hill to climb. In theory it would be nice though. There's just a ton of work to do to hit POSIX compatibility and simultaneously be as good at GNU grep at other things. For example, the simplest way to get the regex engine to be POSIX compatible would be to use an existing POSIX compatible regex engine, like the one found in libc. But that regex engine is generally regarded as quite slow AIUI, and is presumably why GNU grep bundles it's entire own regex engine just to speed things up in a lot of common cases. So to climb this hill, you'd either need to follow in GNU grep's footsteps _or_ build a faster POSIX compatible regex engine. Either way, you're committing yourself to writing a regex engine.

emidoots · on March 9, 2021

I didn't look closely, but Oniguruma is pretty dang fast and has drop-in POSIX syntax + ABI compatability as a compile-time option. Could maybe use that.

burntsushi · on March 10, 2021

The regex engine I maintain includes benchmarks against onig. It's been a couple years since I looked closely, but last I checked, onig was not particularly fast. Compare https://github.com/rust-lang/regex/blob/master/bench/log/07/... vs https://github.com/rust-lang/regex/blob/master/bench/log/07/...

emidoots · on March 10, 2021

Ahh, very interesting, thanks for sharing! Do you have any thoughts around why that is? I presume that's due to Oniguruma supporting a much broader feature set and something like fancy-regexp's approach with mixing a backtracking VM and NFA implementation for simple queries would be needed for better perf? (I am aware you played a role in that) [1]

I have been playing around with regex parsing through building parsers through parser combinators at runtime recently, no clue how it will perform in practice yet (structuring parser generators at runtime is challenging in general in low-level languages) but maybe that could pan out and lead to an interesting way to support broader sets of regex syntaxes like POSIX in a relatively straightforward and performant way.

[1] https://github.com/fancy-regex/fancy-regex#theory

burntsushi · on March 10, 2021

No idea. I've never done an analysis of onig. Different "feature sets" tends to be what people jump to first, but it's rarely correct in my experience. For example, PCRE2 has a large feature set, but it is quite fast. Especially its JIT.

The regex crate does a lot of literal optimizations to speed up searches. More than most regex engines in my experience.

ses1984 · on March 9, 2021

You can solve `-rHInE` with just an alias though.

shakow · on March 9, 2021

Then you go back to incompatibility.

vbernat · on March 9, 2021

Aliases are only active in your interactive shell.

shakow · on March 9, 2021

Exactly, if grep behaves differently in your scripts and your shell, there's no point in not using ripgrep (or ag, or silver searcher, or whatever strikes your fancy).

andoriyu · on March 9, 2021

The difference is - your script works out of the box, or you need to install ripgrep.

orra · on March 9, 2021

True, but then you have the occasional headscratcher that the command behaves differently in your script than when you run it manually.

jftuga · on March 9, 2021

I would love it if --crlf was enabled by default with the Windows version. This would make using ^ and $ easier.

jpitz · on March 9, 2021

Make a config file for ripgrep and enable it.

https://github.com/BurntSushi/ripgrep/blob/master/GUIDE.md#c...

danudey · on March 9, 2021

Slightly unrelated, but ripgrep is probably the best improvement to my day-to-day workflow in years. Thanks so much for making such a great tool!

Cullinet · on March 9, 2021

tldr : ripgrep immediately strikes me as holding incredible medium term potential effects routed via the needs particular to openVMS; particularly complimenting the dramatically improved economics of the x64 port and the profound potential to attract highly experienced critical systems developers to FOSS via Rust breathing air into the VMS world. Oh and to burntsushi : I'm not worthy dude - excellent and thank you for ripgrep which I have only just encountered!

ed : within 30s of learning about ripgrep I learned that VSCode's search routine calls ripgrep and this has just gifted me leverage to introduce Rust in my company... we're a vms shop.. searching for rust and vms gets you the delightful statement that since Rust isn't a international standard there's no guarantee it'll be around in.. oh about right now...the real reason why any language doesn't last long on vms is if the ffi implementation is shoddy. but that's another story that turns into a book the moment I try explaining how much value vms ffi brings... my brain suddenly just now thought "slingshot the critical systems devs outta vms on a Rust hook and things will happen ".

(longer explicative comment in profile - please do check it out if you'd like to know why this really made my day and some)

burntsushi · on March 9, 2021

Thanks for the kind words. I don't know much about openVMS. I read your profile bio. I'm not sure I quite grok all the details, but I would like to caution you: ripgrep is still, roughly speaking, a "grep." That is, it exhaustively searches through corpora. While it has some "smart filtering," it's not going to do much better than other tools when you're searching corpora that don't fit into memory.

I do have plans to add an indexing mode to ripgrep[1], but it will probably be years before that lands. (I have a newborn at home, and my free-time coding has almost completely stopped.)

[1] - https://github.com/BurntSushi/ripgrep/issues/1497

Cullinet · on March 9, 2021

hi! thanks for your reply, and notes gratefully taken! in fact it's avoiding touching memory that's important for vms clusters since the model is shared storage but nothing else. the language support hopefully leans towards implementating a initial subset to poc. there's a serious problem going forward for the vms fs with a major adoption and migration from ODS5 recently abandoned. where my thinking is going is towards extending RMS the record management system which was extended to become what's now Oracle RDB. RDB still uses RMS and behaves cooperatively with fs tools and semantics. without more concentrated contemplation I'm stretching to venture there's a text search tool in the making, but I can say that is going to be carefully investigated and I imagine quite quickly because of the potential. halving licence costs per cpu and going from 8 to 28 cores with features separately very expensive in Oracle iXX.X is behind a lot of new commitment. Because everything is obedient to the system TM, that'll require integration, but let you run the ripgrep function on the raw database from low memory clients ~ so my tentative elevator pitch is going..

throwaway894345 · on March 9, 2021

> you'd have to abandon compatibility with a great deal of scripts.

Having tried to reproducibly package various software, the C layer (and especially the coreutils layer) are the absolute worst, and I wouldn't shed a tear if we started afresh with something more holistically designed.

eeZah7Ux · on March 9, 2021

Debian has been building GNU coreutils and tons of more C reproducibly for years.

throwaway894345 · on March 9, 2021

And it takes a tremendous amount of effort, discipline, expertise, and deep, intimate familiarity with the entire dependency tree. This is the whole problem. We should be able to build software without needing to be intimately familiar with every package in the tree and how it wants us to build it, what its implicit dependencies are, etc.

For example, for the overwhelming majority of Rust and Go projects, there are explicit dependency trees in which all nodes are built roughly the same way such that a builder can do "cargo build" and get a binary. No need to understand each node's bespoke, cobbled-together build system or to figure out which undocumented dependencies are causing your build to fail (or where those dependencies need to be installed on the system or how to resolve conflicts with other parts of the dependency tree).

rstuart4133 · on March 10, 2021

I'm a DD, my C packages (I'm also upstream) are built reproducibly. I have absolutely no idea how it's done.

To be clear, that means I have devoted zero effort to it, and certainly it didn't require discipline, I have absolutely no expertise in the process, and my knowledge of the dependency tree is "gcc -o x x.c" does the job.

eeZah7Ux · on March 10, 2021

You can rely on the maintainers of the Linux distribution to do the work for reproducibility. While also providing security updates. And vetting packages for licensing issues. That's what distributions are for.

throwaway894345 · on March 10, 2021

You can for the packages they support and at the specific versions that they support, but if you have other packages you’re on your own.I certainly appreciate the package maintainers’ toil, but it would be better for everyone if they weren’t doing work that would be easy for a program to do.

asguy · on March 9, 2021

Absolute worst compared to what? Have you tried packaging software on Windows?

If you have something better in mind, please implement it. It’d get used.

throwaway894345 · on March 9, 2021

"Worst" compared to packaging software in other languages.

> If you have something better in mind, please implement it. It’d get used.

Nonsense. It's not a dearth of better options that causes C folks to clutch their autotools and bash scripts and punting-altogether-on-dependency-management; we've had better options for decades. Change will happen, but it will take decades as individual packages become less popular in favor of newer alternatives with better features--build hygiene will improve because these newer projects are much more likely to be built in Rust or "disciplined C" by developers who are increasingly of a younger generation, less wed to the Old Ways of building software.

asguy · on March 9, 2021

> It's not a dearth of better options that causes C folks to clutch their autotools and bash scripts and punting-altogether-on-dependency-management; we've had better options for decades.

I build and package golang/Python/rust/C/C++ binaries using GNU make, bash, and Debian's tooling. I have dependency management, parallel builds, and things are reproducible. I do it that way, because I need a glue that will deploy to VMs, containers, bare metal, whatever. I don't use it because I'm scared of other tooling. I use it because I haven't seen a better environment for getting software I work on, into heterogeneous environments.

I'm not attached to the old way; I'm attached to ways that I can be productive with.

throwaway894345 · on March 9, 2021

My point is that reproducible builds in Go and Rust are much, much easier than C and C++ (and Python since everything depends on C and C++). If your C and C++ programs are building easily (including their direct and transitive dependencies), then you're almost certainly not doing truly reproducible builds (or perhaps you're just very, very familiar with your particular dependency tree in a way that doesn't generalize to arbitrary trees).

To be clear, I'm not arguing that there are better tools for working around C/C++'s build ecosystem; I'm arguing that our lives will be better when we minimize our dependencies on those ecosystems.

Spivak · on March 9, 2021

Look I loathe autotools as a developer but it has the advantage of being extremely battle-tested and comes with lots of tooling to package it out of the box. In RPM spec files any autotools project can be packaged with more or less a single macro.

vmchale · on March 9, 2021

fd is better IMO, plus it is more cross-platform. Same flags on Linux and Mac

dralley · on March 9, 2021

fd doesn't do the same thing at all. fd matches against filenames, rg matches against file contents.

simias · on March 9, 2021

I think both approaches are interesting and valuable. Yes ripgrep is great for interactive usage, but in scripts I'll still be using grep for the foreseeable future.

I think the situation with GNU coreutils on Solaris/BSDs is a bit different: a lot of the time the BSD/Solaris tools on one hand and the GNU tools on the other were incompatible in some ways. Some flags would be different (GNU ls vs. FreeBSD ls for instance have pretty significant flag differences) and then you have things like Make who have pretty profound syntax differences between flavours.

As a result if you needed to write portable scripts you either had to go through a painful amount of testing and special casing for the various dialects, or you'd just mandate that people installed the GNU version of the tools on the system and use that. That's why you can pretty much assume that any BSD system in the wild these days has at the very least gmake and bash installed, just because it's used by third party packages.

So IMO people used GNU on non-GNU systems mainly for portability or because they came from GNU-world and they were used to them.

I know that there's a common idea that GNU tools were more fully featured or ergonomic than BSD equivalents, but I'm not entirely sold on that. I think it's mostly because people these days learn the GNU tools first, and they're likely to notice when a BSD equivalent doesn't support a particular feature while they're unlikely to notice the opposite because they simply won't use BSD-isms.

For instance for a long time I was frustrated with GNU tar because it wouldn't automatically figure out when a compression algorithm was used (bz, gz or xz in general) and automatically Do The Right Thing. FreeBSD tar however did it just fine.

Similarly I can get FreeBSD ls to do pretty much everything GNU ls can do, but you have to use different flags sometimes. If you don't take the time to learn the BSDisms you'll just think that the program is more limited or doesn't support all the features of the GNU version.

An other example out of the top of my head is "fstat" which I find massively more usable than lsof, which is a lot more clunky with worse defaults IMO. It's mainly that lsof mixes disk and network resources by default, while fstat is dedicated to local resources and you use other programs like netstat to monitor network resources. Since I rarely want to monitor both at the same time when I'm debugging something, I find that it makes more sense to split those apart.

mprovost · on March 9, 2021

You're right, there were a lot of compatibility problems in the past because tools reused the same names but didn't behave the same. Sometimes they were completely different like ps - the BSD and System V versions are totally incompatible. There were attempts to unify some tools with something new, for example pax (which was supposed to replace the incompatible tar and cpio, but which basically nobody uses). These days most tools are getting new names (ripgrep, exa) where in the past you'd just call it the same thing. I think that decision has freed people up from having to maintain backwards compatibility with the GBU tools - which weren't always backwards compatible with the tools of the same name that they replaced, as you point out. Once you lose the requirement to be compatible you can actually rethink how the tool should be used and not worry about a decision that was made by someone in the 70s.

simias · on March 9, 2021

Ah yeah I forgot about `ps`, that's definitely one of the worst offenders.

I completely agree with you overall point btw, I just don't think that's in an either/or type of situation. Rewriting the existing standard utils in Rust could provide some benefits, but it's also great to have utils like ripgrep who break compatibility for better ergonomy.

JoshTriplett · on March 9, 2021

One of the standard things coreutils does right that many other implementations do wrong: after running a command with a filename argument, hit up, space, dash, some option you want to add, enter. coreutils handles the option you added. Many other implementations either complain or treat the option as another file.

That was the original feature that made me want coreutils on other systems.

wahern · on March 9, 2021

That's a matter of taste. Argument permutation is evil, IMHO. It's also dangerous. If someone can't be bothered to order their arguments, they also can't be bothered to use the "--" option terminator, which means permutation is a correctness and security headache.

But it's the behavior on GNU systems, and it's even the behavior of most applications using getopt_long on other non-GNU, non-Linux systems (because getopt_long permuted by default, and getopt_long is a de facto standard now). So it should be supported.

JoshTriplett · on March 9, 2021

If you're writing a script, perhaps.

I'm talking about interactive command-line usage, for which the ability to put an argument at the end provides more user-friendliness.

wahern · on March 9, 2021

But the command doesn't know that, and in general best practice is (or was) for commands to not alter their behavior based on whether they're attached to a terminal or not.

I won't deny the convenience. (Technically jumping backwards across arguments in the line editor is trivial, but I admit I keep forgetting the command sequence.) But from a software programming standpoint, the benefit isn't worth the cost, IMO.

And there are more costs than meet the eye. Have you ever tried to implement argument permutation? You can throw together a compliant getopt or getopt_long in surprisingly few lines of code.[1] Toss in argument permutation and the complexity explodes, both in SLoC and asymptoptic runtime cost (though you can trade the latter for the former to some extent).

[1] Example: https://github.com/wahern/lunix/blob/master/src/unix-getopt....

JoshTriplett · on March 9, 2021

> But the command doesn't know that, and in general best practice is (or was) for commands to not alter their behavior based on whether they're attached to a terminal or not.

I completely agree; most commands should behave the same on the command-line and in scripts, because many scripts will start out of command-line experimentation. That's one of the good and bad things about shell scripting.

> Have you ever tried to implement argument permutation? You can throw together a compliant getopt or getopt_long in surprisingly few lines of code. Toss in argument permutation and the complexity explodes, both in SLoC and asymptoptic runtime cost (though you can trade the latter for the former to some extent).

"surprisingly few lines of code" doesn't seem like a critical property for a library that needs implementing once and can then be reused many times. "No more complexity than necessary to implement the required features" seems like a more useful property.

I've used many command-line processors in various languages, all of which have supported passing flags after arguments. There are many libraries available for this. I don't think anyone should reimplement command-line processing in the course of building a normal command-line tool.

I personally don't think permutation (in the style of getopt and getopt_long, at least in their default mode) is the right approach. Don't rearrange the command line to look like all the arguments come first. Just parse the command line and process everything wherever it is. You can either parse it into a separate structure, or make two passes over the arguments; neither one is going to add substantial cost to a command-line tool.

So, this is only painful for someone who needs to reimplement a fully compatible implementation of getopt or getopt_long. And there are enough of those out there that it should be possible to reuse one of the existing ones rather than writing a new one.

burntsushi · on March 10, 2021

> and in general best practice is (or was) for commands to not alter their behavior based on whether they're attached to a terminal or not.

Not so sure about that. ls has been changing its output format based on whether it is being used interactively or not for as long as I can remember at least. Both GNU and BSD versions.

JoshTriplett · on March 10, 2021

Fair distinction. There's a kind of unwritten understanding of what programs should and shouldn't do based on isatty, and I've never seen it explicitly documented.

Things many programs do if attached to a TTY: add color, add progress bars and similar uses of erase-and-redisplay, add/modify whitespace characters for readability, refuse to print raw binary, etc.

Things some programs do, which can be problematic: prompt interactively when they're otherwise non-interactive.

Things no program does or should do: change command-line processing, semantic behavior, or similar.

burntsushi · on March 10, 2021

> Things no program does or should do: change command-line processing, semantic behavior, or similar.

Arguably ripgrep breaks this rule. :-) Compare `echo foo | rg foo` and `rg foo`. The former will search stdin. The latter will search the current working directory.

In any case, I bring this up, because I've heard from folks that ripgrep changing its output format is "bad practice" and that it should "follow standard Unix conventions and not change the output format." And that's when I bring up `ls`. :-)

sylvestre · on March 13, 2021

This is one of the thing that I loved starting using rg :)

simias · on March 10, 2021

Technically it's based on whether the output is a tty or piped/redirected into something, not whether it's run from the shell's prompt or a script.

So for instance if you run a bare `ls` from a script that outputs straight into the terminal you'll get the multi-column "human readable" output. Conversely if you type `ls | cat` in the shell you'll get the single column output.

It can definitely be surprising if you don't know about it but technically it behaves the same in scripts and interactive environments.

burntsushi · on March 10, 2021

That's exactly what I meant. I used "interactively" to mean "attached to a tty." Look at what I was responding to:

"for commands to not alter their behavior based on whether they're attached to a terminal or not"

ls is a clear counter-example of that.

I think the behavior is a good thing. I'm pushing back against this notion of what is "best practice" or not. It's more nuanced than "doesn't change its output format."

anthk · on March 9, 2021

> BSD system in the wild these days has at the very least gmake and bash installed, just because it's used by third party packages.

No. Gmake, maybe. But not bash.