Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

5 makes perfect sense to me; the author's complaints seem kinda silly.

An area this makes sense is, what do you expect to get if you do something like:

    emoji = " "
    print(emoji[:3])
Should this throw an error because there's only one displayed "character"? Should it return only a partial codepoint by returning only the byte data for the first 3 bytes?

Modern strings are complex objects that have evolved a bit past char[] or byte[].



> Should this throw an error because there's only one displayed "character"?

Why should it not? You’re literally breaking the content.

Though in reality, indexing strings is a broken operation. That you’re using it at all is the core issue.

> Modern strings are complex objects that have evolved a bit past char[] or byte[].

And yet that’s exactly what you’re advocating, just with 21 bit chars.


> Why should it not? You’re literally breaking the content.

Strings are just an array of unicode codepoints rather than "characters", so all I'm doing is asking for the first three of those codepoints.

> Though in reality, indexing strings is a broken operation. That you’re using it at all is the core issue.

Substring is a broken operation? What's the justification for that idea?


> Strings are just an array of unicode codepoints rather than "characters", so all I'm doing is asking for the first three of those codepoints.

"Ice trays are just a pile of molecules rather than "cubes", so all I'm doing is separating those molecules", he states as he activates the igniter.

> Substring is a broken operation? What's the justification for that idea?

You take a thing and you mangle beyond recognition without regards for its purpose or meaning. That's like considering the jaws of life a normal part of opening a door to take a piss at work.


One of the first things the author of the article does is breaks it down into the 5 code points and explains their individual meanings.


Your point being, what exactly?

If the user gives you what, as far as they're considered, is a glyph. And you return a completely different glyph. You've mangled their data.


>Should this throw an error because there's only one displayed "character"?

Absolutely.


I think this is where the misunderstanding comes in. Python doesn't treat strings as char[] but as essentially unicode_codepoint[].

Whether this is a good idea on the whole is debatable, there's even a full PEP talking about the security concerns around doing it this way[1].

However, given this is how it works, the behaviour displayed makes complete sense to me and is the best of the bad choices presented by needing multi-byte strings.

[1]: https://peps.python.org/pep-0672/


Well, an index into a string is not necessarily another string, nor a character.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: