Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is sad that many new command-line parsing libraries don't follow the GNU rules anymore. They more often use "-long". Then users have to figure out whether this means "--long" or "-l -o -n -g". To make command line even more confusing, multiple tools I have used allow spaces in optional arguments (e.g. "-opt1 arg1 arg2 -opt2", where arg1 and arg2 set two values for -opt1). Every time I see this, I worry if I could be misusing these tools. I wish everyone could just follow getopt_long() and stop inventing their own weird syntax.


Yet another tragedy broughtabout by golang (at least in part)! :)

(https://golang.org/pkg/flag/#hdr-Command_line_flag_syntax)

Edit: To be clear, I'm mostly "blaming" Go for re-popularizing this style by a) putting it in the standard library and b) being a widely used programming language; I'm not saying Go came up with this or anything.

(idk about the space separated args tho that's even worse)


Cobra (https://github.com/spf13/cobra), which is a pretty popular library for Go CLI applications, behaves more like classical GNU tools. It also offers usage/help autogeneration and autocompletion for popular shells.

Not sure about the relevant point on compact short options syntax as in `tar -xvzf archive.tgz` though... (edit) after a quick & sloppy test it seems to work as expected


Yeah but `tar xvzf archive.tgz` also works so I remain wary of tar. Basically every time I have tried to do something that's not tfz or cfz or xfz, it went wrong until I checked the manpage.


Right. I probably picked one of the most flaky examples, sorry for that. Let's say `ls -lah` that (I hope...) is less ambiguous.

In my defense, the specific example I gave is valid for both GNU and BSD versions of tar. If I understood correctly, the issue you point to (order among short form flags) is related to the fact that `f` expects an argument and consequently has to appear in the last position.


Ah, it's not a direct argument when you omit the hyphen and fall into "traditional" mode. I think after years and years I can finally wrap my head around how that works. :D


speaking of `man`, why can't it be more like `tldr`?


Because learning from unexplained examples is useless.


Is it? That's how humans learn to speak their language, one of the most complex tasks they need to achieve in their life...


On the other hand, cocking up an unfamiliar phrase in a spoken language doesn't usually result in accidentally killing the listener.

I haven't yet had a computer ask for clarification when I used tar or dd in an uncommon and destructive way.


You don't "have to" use the examples, you can read them as get a feel, and read the captions to find the one that does what you want...

Which is faster and probably safer than scanning the documentation for individual flags and hopping you got the nuances right...

See, the two cases aren't:

(1) Thoroughly study man page -> (2) Become expert at the command's options (3) try command secure in your mastery of it

vs

(2) Check tldr examples -> (2) try command

They're rather:

(1) Open man page, (2) scan and skim the man page and the dozens of irrelevant flags, caveats, and obscure options, until you find some flags that look to do what you want, (3) half-read them, (4) try command

vs

(2) Check tldr examples, (2) find an example that does what you want (which is usually one of the covered use cases) (3) try the command using the example syntax


I guess you could consider https://github.com/tldr-pages/tldr.

I personally wouldn't touch that, but that's related to my allergies to JS ecosystem and predisposition to panic attacks when I see stuff like that https://github.com/tldr-pages/tldr/blob/master/package-lock.....


I've switched to tealdeer: same database, rust implementation.

https://github.com/dbrgn/tealdeer


I'm generally satisfied by ZSH inline options summary, but I'm happy to see a sane instantiation of this, it clearly fits a need. Thanks for the pointer (and sorry for the troll :/).


"ZSH inline options summary"?

TIA if you could explain; is it native zsh or a plugin?


There's also tealdeer in rust.


It's really a Googleism that was inherited by Go. I remember their open source C++ command-line library did the same thing.


Google's command line flags library, known to the public as absl::Flags and formerly gflags, does not distinguish between --foo and -foo, these are both the flag "foo". Each flag has a unique name so there is never a short -f equivalent to --foo, and -foo can never mean -f -o -o.

The main design motivation of absl::Flags is that the flag definitions can appear in any module, not just main. Go inherits this. A quirk that Go did not inherit is gflags --nofoo alternate form of --foo=false.

This is all documented at https://gflags.github.io/gflags/#commandline, which is pretty much a verbatim export of the flags package documentation that a Google engineer would see internally.


> The main design motivation of absl::Flags is that the flag definitions can appear in any module, not just main.

Well that's kind of horrifying. That means that command-line arguments are a form of global state, and can silently alter the behavior of the program without the calling scope noticing.

I'm kind of vary of these mechanisms, because I've been bitten by them before. There was a python library I used that read its configuration from sys.argv the first time an object from the library was constructed. I had a rather painful time debugging to find that my script accepting a -b argument resulted in the library switching to batch mode and suppressing all graphics. Dang it, those were my arguments, and the library had no right to go behind my back and look at arguments that hadn't been directly provided to it!


If you think that's horrifying, what if I told you that a sufficiently-entitled operator of a given program can alter the flags at runtime ... using their web browser. https://twitter.com/jbeda/status/888635505201471490


Oh my. I have a gut feeling that I don't like it one bit, though I tend to be a bit more generous on logging. Logging is one of the only cases where its presence or absence don't change the inputs or outputs of any function, nor any other observable effect of the program. Having or removing logs doesn't impact the testability of a function, unlike any other use of global configuration.


You seem like a pretty reasonable person so prepare to be more shocked :-) In a glog stream like this, the things on the right side are not evaluated unless verbosity is on.

  VLOG(2) << expression_with_side_effect() << " LOL";


I have on occasion been called a reasonable person, and good heavens! I could understand that in a functional language with lazy evaluation, but that doesn't fit at all with my mental model of how C++ works. It can't be a macro, because the VLOG parentheses would need to enclose the entire expression. It can't just be the normal operator<< , because then the expression would always be evaluated. I suppose expression_with_side_effects() could return an object that is implicitly convertible to string, and the actual side effects happen in that optional conversion, but that would require lots of cooperation from the user.

I'm almost scared to ask. How is that even implemented?


It is macros. It expands, through several macros, to:

  !VLOG_IS_ON(level) ? (void) 0 : [a hack to stop compiler warnings] & LOG(INFO) << ...


It's originally from Plan9, which predates Google.


too few people understand this.

Go was designed by former Bell Labs people who worked on Unix, Plan9, or both. many things about Go that people attribute to "googlism" is really attributable to work done at Bell Labs.


In my experience at Google, only Go does flags like this. Everything else (python, java, c++, blaze) all use the same flag syntax, which is all via long args with two dashes.


The Java ecosystem has historically used single-dash options, both the SDK tooling (e.g. `java -jar`, `javac -classpath`) and classic common libraries like Jakarta Commons CLI. It has moved away from it more in recent years so now you get a mishmash of single and double dashes depending on how old the option is. In some cases you end up with stuff like `java -showversion` which prints the version to stderr but ` java --show-version` which prints to stdout.


I have seen a mix. For example, many Android developer tools (not written in Go) use this single-dash style. I believe the standard libraries used for parsing in internal tools mostly support both syntaxes, although some docs do describe the old single-dash style by default.


This seems pretty "standard": https://fuchsia.googlesource.com/fuchsia/+/master/docs/conce...

Is it based on Google's internal preferences?


TBH I have no idea; I've heard of Fuchsia, but know nothing about it. It seems pretty far removed from the majority of work I've done in Google3 (the monorepo).


PowerShell picked up the single-dash flag syntax too.


>many things about Go that people attribute to "googlism" is really attributable to work done at Bell Labs.

We're 50 to 30+ years away from that Bell Labs work. They could have checked what happened in the meantime with the rest of the computing world, before re-imposing obsolete ways with the full power of Google behind them...


No, Plan 9 doesn't have long options at all.


I was going to say, it seems like something Google tooling prefers, even non go tooling.


It predates golang significantly. C and C++ bioinformatics tools have used single dash long opts since the 1990s, unfortunately. I expect the transgression didn't originate in the bioinformatics community.


Single-dash long options are not started in Bioinformatics, but they are more often used in this field than elsewhere. Perhaps that is partly because some of the most popular tools (e.g. blast, muscle, bedtools and gatk) followed this unfortunate convention.


I'd always assumed that was because people without significant deckers experience on a terminal are far more likely to type -help than --help or -h


One thing go's flag's package does that deserves a lot of blame is to automatically sort the flags alphabetically when looking at -help. And the fact that you need to hack your away around it instead of there being simply an option like nosort=true or whatever is even worse. The whole idea is crazy and basically equivalent to the statement that there order of parameters in -help serves no useful purpose.


And yet, I expect flags to be sorted in a man page; I rarely read things in a logical order, I'm just looking into what flag does what.

It's a convention-over-configuration thing I think. I mean they set a standard, so you can move on. The alternative is to sit and think and discuss about what order to put your documentation in.


You read text from top to bottom. Chances are that you're writing help text and describing the most commonly used flags at the top, and the more obscure ones lower down.


> I'm just looking into what flag does what.

So you read the whole man page when you need a flag that does something specific or do you mean you never write new things and just have to look up flags already in use by some script? Because for everything else that seems like a fascinating waste of time.


if I'm searching I'm... searching: like using grep with some keywords. Why order of the parameters should matter in this case?


Because you don't always know which words the man page uses to describe specific functionality. So many ways to express similar ideas, language is fun that way.


Thankfully we have git.sr.ht/~sircmpwn/getopt github.com/pborman/getopt github.com/mattn/go-getopt rsc.io/getopt and a hundred more, but I really wish getopt was a part of the standard library.


I think Go's package "flag" was partially inspired by the one made by Apache for Java, but I can't find any sources confirming that now, so I might have seen that in a dream, heh.


I found it surprising too that the native library in Go does not follow this standard.

Fortunately, there are alternative packages that does.


It's not a standard, it's a GNUism. Why would the inventor of Unix follow GNU, which is Not Unix.


because it's better


No, it's weird af to use two dashes.


It's by Ken Thompson, who invented Unix, so it's ok.


I think Powershell does it too.


In my experience, i had never actually stumbled upon a formal list of these "GNU rules", the closest thing i can find is this: https://www.gnu.org/software/libc/manual/html_node/Argument-...

However, even those seem to raise some questions, for example:

> To make command line even more confusing, multiple tools I have used allow spaces in optional arguments (e.g. "-opt1 arg1 arg2 -opt2", where arg1 and arg2 set two values for -opt1).

Is described as something that's permissible:

> An option and its argument may or may not appear as separate tokens. (In other words, the whitespace separating them is optional.) Thus, ‘-o foo’ and ‘-ofoo’ are equivalent.

Therefore the below would be considered equivalent:

  -opt1 arg1 arg2 -opt2
  -opt1arg1arg2 -opt2
Were your expectations different?

Are there any good articles on the benefits of following such rules (any fungible improvements to legibility or usability, as opposed to just "consistency amongst different tools")?

Are there any tools which can validate whether any piece of software conforms to this standard (either by scanning the man pages, or the code, or a formalized format of parameters the app supports)? Personally, the closest i've found is Typer ( https://typer.tiangolo.com/ ) but without anything that can automatically reject non-conformant code as a part of a CI process, i think enforcing such formats would be a non-starter for me.


I think disallowing short option altogether is not a bad convention. With only `-long` and `-long=xxx`, my command line parsing is simply

    foreach a in arglist
        if a=~/^-(\w+)(?:=(.+))?/
            $opts{$1} = $2;
        else
            push @pos_list, $a 
There is no need for the dependency and complexity of `getopt` library.

And for the user side, no more cryptic ninja arts. The only trick user need to learn is the shell alias and functions.


The point isn’t to use getopt with all its complexity, the point is that two dashes for long options is already extremely well established and Go popularizing long options with a single dash is very much a regression. It creates a lot of unnecessary confusion when they really should have known better and just stuck with the conventions.

In your regex at least, removing the confusion is simple as adding another ‘-‘, and now you’re in compliance with the expectations of almost every IT person in the world who uses Unix command lines.


Go's flag parser treats two leading minus signs the same as one.

https://golang.org/pkg/flag/#hdr-Command_line_flag_syntax


Sure, but convention is to treat `-long` as `-l -o -n -g'.


But that is a bad convention, it prioritizes typing speed over readability, and it only works in certain cases (for flag-only parameters). I'd say good riddance to it.


That doesn't justify long options starting with a single dash, as one could have made every option start with two dashes. Sure, `--` is longer than `-`, but typing speed shouldn't matter right?


Sure, but the only reason to add an extra dash is to differentiate --long from -l -o -n -g. No reason to just add extra characters if ypu don't need this differentiation. Not to mention, Go cmd line parsing actually accepts both -long and --long, if you find the -- version more aesthetically pleasing.


If we only did "the accepted convention" indefinitely there would be no going forward. I see this change(of being explicit) as a win. The situation was already confusing before with different tools using different conventions. This way of being explicit allows you to be consistent across OS-s too. The world is not only GNU, fortunately.


Buffer overflows are well established too, not all traditions are good.


GCC does this pretty heavily, no?

https://gcc.gnu.org/onlinedocs/gcc/Option-Summary.html

If it mattered much, I'd expect GNU to be internally consistent.


GCC or anyone else's C compiler doesn't really count as far as this convention goes. GCC's flag parsing is aiming to be compatible with conventions that preceded GNU, other vendor's compilers break their own conventions to be compatible with GCC or whatever else cc(1) is etc.

GNU's conventions are generally complimentary, but not incompatible with POSIX. And POSIX specifies the behavior of the sort of flags cc(1) should understand[1].

There are many POSIX and other traditional *nix tools that are a convention unto themselves for historical reasons. E.g. notice how GNU "dd" doesn't follow normal GNU command-line conventions either.

1. https://pubs.opengroup.org/onlinepubs/7908799/xcu/cc.html


-long = -l -o -n -g was probably a mistake. But then again, tar xvzf should have been called untar, so it’s not like there are a shortage of opinions and historical mistakes.


How would one call tar `xvjf` then?


You can just call ‘tar xvf’ and it will detect the compression format.


My question was about how "tar xvzf" should be called "untar".

Your reply might still make sense (i.e. untar could automagically figure it out), but I was highlighting how tar/untar today also means (de)compressing that tar archive using many different compression formats.


It should probably be `untar --format gzip`


They are just respecting older, more venerable tools, like find.


Find's syntax is really annoying and clashy too. It makes sense. -- for extended syntax and - for shorthand. Why can't it follow this. Also fuck dd


I'd like a word with the person who thought that regular ( ) parentheses for grouping were a good idea in the find syntax, requiring them to be backslash-escaped in shell scripts.

The obviously right choice would have been [ ]. You know, like in

   if [ $foo ... ]


That won't work because '[' is an alias for test.


"[" as an argument to another command is fine.

"[" is just another name for the "test" command. It isn't special syntax.


It's pretty silly to ask a command written in the seventies(?) to follow a convention (for another operating system) presented in the nineties.


dd I really don't mind, because it makes me think and double-check that I'm flashing the right device every time.

find is annoying, though. I'd encourage you to check out fd.

https://github.com/sharkdp/fd


Ah, thank you very much!


Same with ffmpeg


The Amiga had a pretty cool feature where CLI argument parsing and help was provided via a library. This made things nicely consistent across almost all of the CLI tools.


Suddenly I'm reminded of how Windows represents the command line as a single string (PWSTR), and how entry points that expect argv-style are parsed by the CRT at startup.

vs. Unix where char *argv[] is what makes it to the syscall layer.

The result there is that command line processing is more consistent program-to-program on Unix. On Windows, every program could decide to tokenize the arguments differently.


> On Windows, every program could decide to tokenize the arguments differently.

Worse, even Microsofts two implementations (CRT and WINAPI) disagree: https://github.com/rust-lang/rust/issues/44650


I feel like there are a few interesting Microsoft phenomena that contrast with Unix thinking in both of these examples.

CommandLineToArgvW - You called that "WINAPI", but it's worth mentioning the more specific provenance of shlwapi.dll. This is not a core, foundational part of Windows that is used in core, foundational things. It's a helper function from the shell (explorer, not shell in the Unix sense). So, while it has a look and function that seems pretty foundational, it really isn't. It's there because somebody working on Explorer long ago found that useful to have and decided to export their helper function in the DLL.

CRT - A CRT binary ships with Windows, but really, that code is maintained and distributed by the compiler guys and DevDiv. So theoretically, the argv parser could change at those people's whim alongside a new Visual Studio release. And it seems from squinting at that github issue like that might have happened here.

So really ... there are more artifacts here attesting to the fact that the command line arg parser is not part of the operating system. People find that functionality useful, so they look for things that "look like" the operating system official method, and maybe they find stuff that does "look like it" -- but such a thing isn't really there.


I was not arguing that it was or was not part of the OS but just showing that the parsing being deferred to application code has produce two subtly incompatible implementations that differ for no reason other than that they do.


Yeah, I am not considering anything you say to be argumentative, I am just going in tangents with this topic because I have some experience there and find it interesting.


That's a good thing. You have to be careful using a command line SQL query when typing "SELECT ". If the processing is left to the program, an SQL app in Windows knows you didn't mean "" to mean all the files in the current folder.



How is that different from the getopts library? I don’t understand.


I believe it was standardized and built in, add peer pressure and it was just short of being enforced.


There were also the Amiga style guides that were published with 2.0 that detailed how developers should build application user interfaces. The fragmentation in Linux/Unix distributions means that this kind of consistency is pretty much impossible, although FreeBSD does a much better job of being consistent than $majorlinuxdistros.


Really?

I had the impression only CLI tools from the Java world are that strange.


Yes, I messed up -Xmx1024m a million times in my career. We used GNU for in-house stuff so sometimes I'd have --Xmx and -Xmx on the same line


yeah; long w --, short w - is intuitive and annoyingly close to universal...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: