Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Moreutils (joeyh.name)
129 points by signa11 on Feb 7, 2015 | hide | past | favorite | 36 comments


This is a great set, but a request to Joey:

There are two conflicting implementations of a parallel utility. And from what I can tell, the GNU parallel utility is much more useful than the one in moreutils. Which meant that when I was 1) doing processing which benefited greatly from parallelization and 2) found that the moreutils version wasn't doing what I wanted nor could I figure out how to make it do so (compounded by confusion over online searched providing GNU parallels syntax which didn't work), I had to remove the entire moreutils set to install GNU parallels under Debian.

The two versions aren't even a candidate for /etc/alternatives resolution as the commandline syntax and behavior differs.

Either a name change or refactoring to a different package for the 'parallel' utility would avoid much of this.

And I'd really like to see numutils packaged.

Also: 'unsort': sort -R | --random-sort

(using GNU coreutils 8.23)

(I'm not familiar with a seed-based randomized sorting utility though.)



I'm familiar with that.

Joey's apparent resistance to simply splitting out 'parallel' to its own package is ... disappointing. His final comment (regarding other utilities in upstream and switches) is non sequiturs and red herrings.


Welcome to Debian politics.

Where boneheads like Joey get to block trivial fixes for decades (5 years and counting in this case). The project really needs a better process to terminate 'lame' maintainers.


Just for completeness https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=518696 Had parallel been packaged at that time (Mar 2009), it would have pre-empted the parallel in moreutils (which was added June 2009).


My point is that Joey is not in general a bonehead. That's part of what makes this all the more disappointing.

He's no longer the package maintainer, which should make the split more viable.


The last update on that old bug was in 2012.

This is the relevant current bug on the matter against moreutils: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=718816


> And I'd really like to see numutils packaged.

https://packages.debian.org/sid/num-utils


I find it entertaining that bash (zsh, ksh, fish, etc) itself provides ways to do what many of these utilities do. The anonymous pipes, named pipes, and process substitution mechanisms can replace many of these tools.

For example:

    pee -> 
    some_process | tee >(command_one) | tee >(command_two) [...]
    # This one might need a bit more magic with named pipes to consolidate the output without race conditions, since command_N will be executed in parallel. Or take a note from the chronic replacement below and use a temporary file to execute them serially.

    chronic ->
    TMPFILE=$(mktemp) some_process 2>&1 > $TMPFILE || cat $TMPFILE; rm $TMPFILE

    zrun ->
    command <(gunzip -c somefile)
Still, having a utility to abstract away the pipes makes sense.


These are cool, and I use chronic all the time. But is there any more documentation beyond this page? I can't find any, and I'd love to read more about pee and see some examples. It seems there is more documentation for the rejected utilities than the accepted ones!


When you clone the Git repository and build the package all the utilities come with manual pages built from DocBook, e.g. for chronic:

    $ man ./chronic.1 | col -b | grep -v ^$ | head -n 12
    CHRONIC(1)                                                                                                                              CHRONIC(1)
    NAME
           chronic - runs a command quietly unless it fails
    SYNOPSIS
           chronic COMMAND...
    DESCRIPTION
           chronic runs a command, and arranges for its standard out and standard error to only be displayed if the command fails (exits nonzero or
           crashes).  If the command succeeds, any extraneous output will be hidden.
           A common use for chronic is for running a cron job. Rather than trying to keep the command quiet, and having to deal with mails containing
           accidental output when it succeeds, and not verbose enough output when it fails, you can just run it verbosely always, and use chronic to
           hide the successful output.
                   0 1 * * * chronic backup # instead of backup >/dev/null 2>&1


Ah, thanks!

More people might use this if the author put those docs online and linked to them from the main page. I don't know docbook but it looks like styling it as HTML should be easy.

Now I understand pee. I wrote something less generalized here:

https://github.com/pjungwir/stutter

I guess `stutter foo` is equivalent to `pee cat foo`.


Wait, is sponge just another way to redirect to a file? What's the benefit of:

  $ echo hi |sponge y
over:

  $ echo hi > y

?


From the man page:

> Unlike a shell redirect, sponge soaks up all its input before opening the output file. This allows constructing pipelines that read from and write to the same file.


Ah! That is indeed useful.


    $ echo 'foo' > foo
    $ echo 'bar' > bar
    $ cat foo bar > foo
    cat: foo: input file is output file
    $ cat foo 
    bar
vs

    $ echo 'foo' > foo
    $ echo 'bar' > bar
    $ cat foo bar | sponge foo
    $ cat foo 
    foo
    bar


If you're good with vim (and particularly with vim macros), `vidir` is indispensable.


GNU Emacs has something like that in Dired. It's called Wdired (writable dired) and allows editing the Dired buffer and then applies the changes. Think of it as editing `ls` output.

https://www.gnu.org/software/emacs/manual/html_node/emacs/Wd...


`vidir` uses the EDITOR variable, so if you set it to emacs you can use with it:

    EDITOR=emacs vidir .


AWESOME. I didn't know that, but it makes sense and is very unixy.


One weird thing about moreutils is that the source releases are only available via the source package on debian.org.


There's a link to the git repo in OP.


source release != git repo


Well, yes, because git repo > source release so it's clearly not equal.

I just checked; the repo tags releases in some reasonably proper manner. (Personally I prefer some prefix like "release_0.2" to a tag simply named "0.2", but it does the job.)


>because git repo > source release so it's clearly not equal.

No, they have different uses. I simply want to install the software, so I want a source release tarball. Source releases include more than what a git repo provides, such as pre-built configure scripts and Makefiles. A tag in a git repo is no substitute for a proper release.


That's true for something that uses e.g. autoconf, but moreutils doesn't build any makefile or configure script, it's right there in the Git repository. So I see what your objection is for packages in general, but it doesn't apply in this case.


We package moreutils in GNU Guix, and it's much more preferable to download a source tarball than have to clone a git repo, so we download the tarball from Debian. We clone the git repo when there's no other choice, but it's far from ideal.


Shameless plug but I think it might be useful to others: https://github.com/robmccoll/bt

bt (between)

counts the time between occurrences of the given string on stdin stdin is consumed. output will be the times in floating point seconds, one per line


Relatedly you might enjoy this collection of sysadmin-tools:

https://github.com/skx/sysadmin-util


Considered putting these in Debian?


I created just such a utility several years ago. It's called rlimit and is basically a command line interface to the standard getrlimit() and setrlimit() unix calls. You can find it here http://freecode.com/projects/rlimit. I'd be happy to move the source to GitHub.


ulimit(1) is an interface precisely to the *rlimit calls that is a built-in to the Bash shell, and presumably others.


ulimit can read and set limits for the current shell. rlimit set limits for a child process. Admittedly you could do nearly the same thing by setting limits with ulimit, running the target command and then resetting the limits to their former state or by running a sub-shell, setting the limits there and then running your command in that environment. For example:

    (ulimit -d 1024; <command>)
Or you could do it in one normal looking command with rlimit.

    rlimit -d 1m <command>
Plus rlimit can set things like real-time priority which ulimit cannot.


If you’re on OS X with Homebrew you can install it with `brew install moreutils`.


As I see no one has mentioned it, let me pipe in with one more text processing tool that is invaluable in our modern world - jq https://stedolan.github.io/jq/ the commandline JSON processor. Consumes JSON input and its power is somewhere between sed and awk.


jq is fucking amazing. I think it counts as my Most Awesome Shell Tool Discovery of 2014.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: