Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, but I think the post is somewhat disingenuous in that if you want to read text-files, you can just open the files in non-binary (ie: the exact same script with a single character changed, the mode from "rb" to "r") -- and it will work for most cases where you can expect it to work.

I've not dived all the way down the (python3) rabbit hole, but it seems to me that there probably is (should be, at least) a way to say, yes, I want to read and write in binary, I don't care about conversion, even if one or both streams happen to be standard input/output/error, not just some other binary file.

I agree that if the second script indeed is the simplest way, then that is probably too hard. But as clearly demonstrated, it's not that hard to read and write "text files".

[edit: to whit: "By default, these [stdin,out,err] streams are regular text streams as returned by the open() function. (...)

To write or read binary data from/to the standard streams, use the underlying binary buffer. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc').

Using io.TextIOBase.detach(), streams can be made binary by default. This function sets stdin and stdout to binary:"

    def make_streams_binary():
        sys.stdin = sys.stdin.detach()
        sys.stdout = sys.stdout.detach()

https://docs.python.org/3/library/sys.html#sys.stdout]


So, to make the script work similarly for python3:

    $ diff test.py test3.py 
    5c5
    <     f = sys.stdin
    ---
    >     f = sys.stdin.buffer
    13c13
    <         shutil.copyfileobj(f, sys.stdout)
    ---
    >         shutil.copyfileobj(f, sys.stdout.buffer)

We can now read in binary, and write to stdout in binary, and this script will happily garble your terminal if you point it at /bin/ls. If the lines above were unclear, here is the entire "python3" version. Note that we now explicitly treat standard input and output as binary (which is "wrong"):

    import sys
    import shutil

    for filename in sys.argv[1:]:
        f = sys.stdin.buffer
        if filename != '-':
            try:
                f = open(filename, 'rb')
            except IOError as err:
                print >> sys.stderr, 'cat.py: %s: %s' % (filename, err)
                continue
        with f:
            shutil.copyfileobj(f, sys.stdout.buffer)
And we can now, for example:

    $ head -c 200 /bin/ls | python3 test3.py -
    ELF>�H@@p�@8  @@@@@@�88@8@@ $
The crucial difference is, that if we wanted only to deal with text, we'd get a sane error (see above) from python3. I'd probably add a comment about the sys.stdin.buffer as it isn't exactly obvious that what we do is go from stdin that is a text stream to the underlying buffer that is binary -- but I can't really agree that this is super-hard -- it took me a few minutes to google "python3 binary io" and find this…

And all these scripts (appear to) deal fine with utf-8 file names both in python2 and 3.

[edit: Having redone this by hand, I see that this is sort of what the "new" script in the post does, with some caveats on how stdin/out is "sometimes" not not binary... I still have a hard time accepting that this short python3 script is much more fragile than the original short python2 script. The fact that you get an error if you try to copy a binary stream to a text stream seems sane to me...]




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: