Not being compatible is often the point of tools like ripgrep.

thesuperbigfrog · on March 9, 2021

If a Rust rewrite of Coreutils is not backward compatible, it is not a replacement and the current GNU Coreutils would still be needed for backward compatibility.

There is a long history of newer, better tools supporting backward compatibility to be a replacement for their predecessors:

- bash and zsh support backward compatibility to be replacements for sh.

- vim has compatibility mode to be a replacement for vi.

If the Rust implementation is not backward compatible, it should not be called "Coreutils".

anand-bala · on March 9, 2021

The Rust rewrite of Coreutils (this post essentially) _is_ meant to be a drop in replacement for GNU/Coreutils. OTOH, ripgrep and fd are not "drop-in replacements". Rather, they are "replacements" that devs can use if they want to.

GoblinSlayer · on March 9, 2021

[flagged]

thesuperbigfrog · on March 9, 2021

The Rust implementation that we are discussing is MIT licensed and as far as I know not part of GNU.

I do not believe that "Coreutils" is trademarked by GNU.

There are other projects that use the name "Coreutils" that are not part of GNU:

https://github.com/DiegoMagdaleno/BSDCoreUtils

GoblinSlayer · on March 10, 2021

>I do not believe that "Coreutils" is trademarked by GNU.

That's exactly why GNU sort of coreutils should be called GNU/Coreutils. The comment said that GNU/Coreutils are equivalent to Coreutils, I said they aren't.

entropicdrifter · on March 9, 2021

I think the person you're replying to was just memeing on the Richard Stallman essay that has the line, "What you call Linux should really be called GNU/Linux"

audience_mem · on March 11, 2021

I was sad to find out that rms apparently didn't even say it.

steerablesafe · on March 9, 2021

It's not the point of ripgrep. If ripgrep were only marginally better at common grepping tasks then it could hardly justify not being compatible. It aims to be much better though, so it can justify it well. But not being compatible is not the point or the goal of the project, I believe.

burntsushi · on March 9, 2021

Yes, this is correct in some sense. I didn't set out to create an "incompatible grep." I set out to build a "better" grep, where "better" could include things like being incompatible.

Of course, ripgrep is still ultimately a lot more similar to grep than it is different. Where possible, I used the same flag names and prescribed similar behavior---if it made sense. Because it's good to lean on existing experiences. It makes it easier for folks to migrate.

So it's often a balancing act. But yes, POSIX compatibility is certainly a non-goal of ripgrep.

coldtea · on March 9, 2021

That said, thanks for making ripgrep so good. I use it 100s of times every day for 3-4 years now... (and iirc, it's the default engine in VS Code as well).

orra · on March 9, 2021

Not having to specify "-rHInE" on every invocation is one killer reason to use ripgrep, over grep. It's a concrete advantage of you not having 100% compatibility as a goal.

(The other killer reason is the blitzing speed.)

Sebb767 · on March 9, 2021

If that is the only incompatibility, it would be easy to make a patch that checks if it is called as "grep" and default to "not -rHInE" - so one could ship ripgrep as default and yet have backwards compatibility. Some busy boxes already do that iirc.

EDIT: So I've quickly looked into it and it seems nobody did an extensive comparison to the grep feature set or the POSIX specification. If I have some time later this week I might do this and check whether something like this would be viable.

burntsushi · on March 9, 2021

It's not even close to the only incompatibility. :-) That's a nominal one. If that were really the only thing, then sure, I would provide a way to make ripgrep work in a POSIX compatible manner.

There are lots of incompatibilities. The regex engine itself is probably one of the hardest to fix. The incompatibilities range from surface level syntactic differences all the way down to how match regions themselves are determined, or even the feature sets themselves (BREs for example allow the use of backreferences).

Then of course there's locale support. ripgrep takes a more "modern" approach: it ignores locale support and instead just provides what level 1 of UTS#18 specifies. (Unicode aware case insensitive matches, Unicode aware character classes, lots of Unicode properties available via \p{..}, and so on.)

Sebb767 · on March 9, 2021

Pity! I did look; only "-E" and "-s" diverge from the POSIX standard parameter-wise. But making significant changes to the pattern engine is probably not worth it.

Thanks anyway, I enjoy rg quite a lot :)

nicoburns · on March 9, 2021

It's worth noting that the implementation of ripgrep has been split up into a whole bunch of modular components. So it wouldn't be out of the question for someone to piece those together into a GNU-compatible grep implementation.

orra · on March 9, 2021

True, though is there any point? ripgrep's homegrown regex engine only supports true regular expressions.

To give backreference support, ripgrep can optionally use PCRE. But PCRE already comes with its own drop in grep replacement...

burntsushi · on March 9, 2021

To the extent that you want to get a POSIX compatible regex engine working with ripgrep, you could patch it to use a POSIX compatible regex engine. The simplest way might be to implement the requisite interfaces by using, say, the regex engine that gets shipped with libc. This might end up being quite slow, but it is very doable.

But still, that only solves the incompatibilities with the regex engine. There are many others. The extent to which ripgrep is compatible with POSIX grep is that I used flag names similar to GNU grep where I could. I have never taken a fine toothed comb over the POSIX grep spec and tried to emulate the parts that I thought were reasonable. Since POSIX wasn't and won't be a part of ripgrep's core design, it's likely there are many other things that are incompatible.

A POSIX grep can theoretically be built with a pretty small amount of code. Check out busybox's grep implementation, for example.

While building a POSIX grep in Rust sounds like fun, I do think you'd have a difficult time with adoption. GNU grep isn't a great source of critical CVEs, it works pretty well as-is and is actively maintained. So there just isn't a lot of reason to. Driving adoption is much easier when you can offer something new to users, and in order to do that, you either need to break with POSIX or make it completely opt-in. (I do think a good reason to build a POSIX grep in Rust is if you want to provide a complete user-land for an OS in Rust, perhaps if only for development purposes.)

Sebb767 · on March 9, 2021

Well, the reasons I see for being POSIX-compatible would be:

1. Distributions could adopt rg as default and ship with it only, adding features at nearly no cost

2. The performance advantage over "traditional" grep

Number 1 is basically how bash became the default; since it is a superset of sh (or close enough at least), distributions could offer the feature set at no disadvantage. Shipping it by default would allow scripts on that distribution to take advantage of rg and, arguably, improve the situation for most users at no cost.

If one builds two programs in one with a switch, you're effectively shipping optional software, but in a single binary, which makes point 1 pretty moot. If you then also fall back on another engine, point 2 is moot as well - so the only point where this would actually be useful is if rg could become a good enough superset of grep that it would provide sufficient advatages (most greps _already_ provide a larger superset of POSIX, though). Everything else would just add unnecessary complexity, in my opinion.

But it would have been nice :)

burntsushi · on March 9, 2021

Ah I see. Yeah, that's a good point. But it's a very very steep hill to climb. In theory it would be nice though. There's just a ton of work to do to hit POSIX compatibility and simultaneously be as good at GNU grep at other things. For example, the simplest way to get the regex engine to be POSIX compatible would be to use an existing POSIX compatible regex engine, like the one found in libc. But that regex engine is generally regarded as quite slow AIUI, and is presumably why GNU grep bundles it's entire own regex engine just to speed things up in a lot of common cases. So to climb this hill, you'd either need to follow in GNU grep's footsteps _or_ build a faster POSIX compatible regex engine. Either way, you're committing yourself to writing a regex engine.

emidoots · on March 9, 2021

I didn't look closely, but Oniguruma is pretty dang fast and has drop-in POSIX syntax + ABI compatability as a compile-time option. Could maybe use that.

burntsushi · on March 10, 2021

The regex engine I maintain includes benchmarks against onig. It's been a couple years since I looked closely, but last I checked, onig was not particularly fast. Compare https://github.com/rust-lang/regex/blob/master/bench/log/07/... vs https://github.com/rust-lang/regex/blob/master/bench/log/07/...

emidoots · on March 10, 2021

Ahh, very interesting, thanks for sharing! Do you have any thoughts around why that is? I presume that's due to Oniguruma supporting a much broader feature set and something like fancy-regexp's approach with mixing a backtracking VM and NFA implementation for simple queries would be needed for better perf? (I am aware you played a role in that) [1]

I have been playing around with regex parsing through building parsers through parser combinators at runtime recently, no clue how it will perform in practice yet (structuring parser generators at runtime is challenging in general in low-level languages) but maybe that could pan out and lead to an interesting way to support broader sets of regex syntaxes like POSIX in a relatively straightforward and performant way.

[1] https://github.com/fancy-regex/fancy-regex#theory

burntsushi · on March 10, 2021

No idea. I've never done an analysis of onig. Different "feature sets" tends to be what people jump to first, but it's rarely correct in my experience. For example, PCRE2 has a large feature set, but it is quite fast. Especially its JIT.

The regex crate does a lot of literal optimizations to speed up searches. More than most regex engines in my experience.

ses1984 · on March 9, 2021

You can solve `-rHInE` with just an alias though.

shakow · on March 9, 2021

Then you go back to incompatibility.

vbernat · on March 9, 2021

Aliases are only active in your interactive shell.

shakow · on March 9, 2021

Exactly, if grep behaves differently in your scripts and your shell, there's no point in not using ripgrep (or ag, or silver searcher, or whatever strikes your fancy).

andoriyu · on March 9, 2021

The difference is - your script works out of the box, or you need to install ripgrep.

orra · on March 9, 2021

True, but then you have the occasional headscratcher that the command behaves differently in your script than when you run it manually.

jftuga · on March 9, 2021

I would love it if --crlf was enabled by default with the Windows version. This would make using ^ and $ easier.

jpitz · on March 9, 2021

Make a config file for ripgrep and enable it.

https://github.com/BurntSushi/ripgrep/blob/master/GUIDE.md#c...

danudey · on March 9, 2021

Slightly unrelated, but ripgrep is probably the best improvement to my day-to-day workflow in years. Thanks so much for making such a great tool!

Cullinet · on March 9, 2021

tldr : ripgrep immediately strikes me as holding incredible medium term potential effects routed via the needs particular to openVMS; particularly complimenting the dramatically improved economics of the x64 port and the profound potential to attract highly experienced critical systems developers to FOSS via Rust breathing air into the VMS world. Oh and to burntsushi : I'm not worthy dude - excellent and thank you for ripgrep which I have only just encountered!

ed : within 30s of learning about ripgrep I learned that VSCode's search routine calls ripgrep and this has just gifted me leverage to introduce Rust in my company... we're a vms shop.. searching for rust and vms gets you the delightful statement that since Rust isn't a international standard there's no guarantee it'll be around in.. oh about right now...the real reason why any language doesn't last long on vms is if the ffi implementation is shoddy. but that's another story that turns into a book the moment I try explaining how much value vms ffi brings... my brain suddenly just now thought "slingshot the critical systems devs outta vms on a Rust hook and things will happen ".

(longer explicative comment in profile - please do check it out if you'd like to know why this really made my day and some)

burntsushi · on March 9, 2021

Thanks for the kind words. I don't know much about openVMS. I read your profile bio. I'm not sure I quite grok all the details, but I would like to caution you: ripgrep is still, roughly speaking, a "grep." That is, it exhaustively searches through corpora. While it has some "smart filtering," it's not going to do much better than other tools when you're searching corpora that don't fit into memory.

I do have plans to add an indexing mode to ripgrep[1], but it will probably be years before that lands. (I have a newborn at home, and my free-time coding has almost completely stopped.)

[1] - https://github.com/BurntSushi/ripgrep/issues/1497

Cullinet · on March 9, 2021

hi! thanks for your reply, and notes gratefully taken! in fact it's avoiding touching memory that's important for vms clusters since the model is shared storage but nothing else. the language support hopefully leans towards implementating a initial subset to poc. there's a serious problem going forward for the vms fs with a major adoption and migration from ODS5 recently abandoned. where my thinking is going is towards extending RMS the record management system which was extended to become what's now Oracle RDB. RDB still uses RMS and behaves cooperatively with fs tools and semantics. without more concentrated contemplation I'm stretching to venture there's a text search tool in the making, but I can say that is going to be carefully investigated and I imagine quite quickly because of the potential. halving licence costs per cpu and going from 8 to 28 cores with features separately very expensive in Oracle iXX.X is behind a lot of new commitment. Because everything is obedient to the system TM, that'll require integration, but let you run the ripgrep function on the raw database from low memory clients ~ so my tentative elevator pitch is going..