Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Python 3 requires you to do something more complicated when crap comes in.

Or in most cases: Python 3 falls flat on the floor with all kinds of errors because you did not handle unicode with one of the many ways you need to handle it.

On Python 2 you decoded and encoded. On Python 3 you have so many different mental models you constantly need to juggle with. (Is it unicode, is it latin1 transfer encoded unicode, does it contain surrogates) and then for each of them you need to start thinking where you are writing it to. Is it a bytes based stream? then surrogate errors can be escaped and might result in encoding garbage same as in python 2. If it a text stream? Then that no longer works then you can either crash or write different garbage. If it's latin1 transfer encoded then most people don't even know that they have garbage. I filed lots of bugs against that in WSGI libs.

If you write error free Python 3 unicode code, then teach me. (Or show me your repo and I show you all the bugs you now have)



> (Or show me your repo and I show you all the bugs you now have)

this would be great. Show me! I'd love to know:

https://bitbucket.org/zzzeek/sqlalchemy/

https://bitbucket.org/zzzeek/mako/

https://bitbucket.org/zzzeek/alembic/

I'm guessing you'd go for Mako first since it has the most unicode intense stuff going on (and it uses lots of your code).


As an example mako cli. You can call this an error or not, but with C locale your cmdline will die with UnicodeErrors when you open a non existing file with unicode filename on Python 3 but not so on Python 2 where it will do the correct thing. It will also die with unicode errors under the same situation when your template renders any unicode characters. Again, something that probably works fine on python 2 and correctly.

Or if you would put unicode characters into your README.rst you could no longer safely install mako. Again, Python 3 only.

These are just two things I found on github.

Another easy one: alembic README's now no longer can safely contain unicode. They would break on Python 3, but work just fine on Python 2 because of the code in list_templates.


the cmdline template runner at the moment isn't doing unicode in Py2K either, crashes there too.


I would not be surprised if you can construct contrived examples of how Python 3 can be broken. In my experience, writing real life code, I ship more stable software writing in Python 3 than Python 2.

I mostly work with subprocesses or directly reading data from socket connections, and I run all of my bytes through strict mode. If something doesn't decode properly, an error is returned. Currently I am working on an interactive way (inside of Sublime Text) to present to the user a way to see text in different encodings so they can help debug the issue on their own.

So, yes, you need to write helper functions and have an interface to deal with properly handling encodings. This has been my experience in every language I've ever worked in. I can't imagine there is a way around it. Is this a reason Python 3 sucks compared to 2? Not in my experience. I had far more issues in Python 2 with encodings and not being sure what other libraries and packages had done in regards to handling unicode data. Hmm, so ftplib accepts unicode for filenames. Does it encode it? What encoding does it use? Oh, look at that, it has just been coercing to ascii because it can.

So yeah, writing a simple little toy command line app needs more boiler-plate to deal with unicode. Any real app is going to need that and a ton more. And you are going to have to decide how to error with encodings, and how to let users identify encodings. And you are going to need to write a global exception handler for Python to capture unexpected exceptions and log them to a file so users can send crash reports. Yay, sys.excepthook!

But anyway, I think it all comes back to the fact that I know what I am dealing with far more quickly with Python 3 than with 2. Again, maybe because I don't write apps that deal with local file paths (expect abstracted through a subprocess).

Unfortunately, most of the code where I deal with crappy encodings from FTP servers and SVN is closed source. The open source stuff is at https://github.com/wbond.


> In my experience, writing real life code, I ship more stable software writing in Python 3 than Python 2.

Real life code is not the same for everybody.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: