Hacker News

dang · on Dec 25, 2019

Please don't cross into personal attack, no matter how wrong or ignorant another comment is, or seems.

https://news.ycombinator.com/newsguidelines.html

nopurpose · on Dec 25, 2019

Author mentioned elsehwere that he used that trick in postgresql , not in python, where it allowed him to skip SQL acrobatics and sped up query.

crimsonalucard · on Dec 25, 2019

Thank you for this. This is indeed where I actually used the technique. You do get dramatic speed up if you do it in SQL this is provable.

I chose to use python as an example here mainly because the SQL example would get very convoluted and be much less concise.

The main point was the general technique of moving back and forth between isomorphic types, but people are getting too involved with implementation details of python. Yeah it might(keyword) be slower, yeah it's more convoluted than a traditional list comprehension in python, but that is not the point.

Lesson learned: don't use arbitrary examples to prove a point. Use an example where all metrics are dramatically better.

crimsonalucard · on Dec 25, 2019

First off, I don't like your sarcastic attitude. It's against the rules on HN to do that. Don't violate rules to be rude.

Second off, my example is just an example. It's not production level code. The intention of the example is to show "lifting" from from one type into another "category." The efficiency is less relevant here... in some contexts it can be faster in other contexts it could be slower or not even possible.

>Your gloss reveals further substantial misapprehensions about python basics (there are no regexps here, no it's not faster to do it that way, no it's not any more order preserving then iteration).

You sentence here reveals a failure in reading comprehension. I never mentioned what did was a regexp. I said you Could use a regexp to make it faster. A regexp if you know basic computer science is the one of the fastest possible string parsing operations. Take a course on Automata at a top university, this would be helpful in educating you, but you need to qualify first.

If there is a bottleneck it's in the casting functors str() and eval(). I didn't benchmark it and I very much doubt you did either. I suspect it is faster as the interpreter needs to "interpret" the data structure anyway regardless of whether or not it has string quotation marks around it so the parsing bottleneck is already there regardless of whether or not you have quotation marks. This is just a guess, and I'm pretty sure both of us are too lazy to generate actual specs to prove it either way. Stop throwing hard conclusions at things you have ZERO evidence for. If you do have evidence, show it, don't talk out of your ass... I'm willing to concede given evidence but without evidence you don't know what you're talking about.

The string does hold order preserving properties. You are absolutely wrong about this:

"{1:2, 3:,4}" during string manipulation can be made into "[1,2,3,4]" without sorting because the string itself is a List of ordered characters. Unserialized, the dictionary has no order. Again what's going on here is you misreading and not understanding my explanation.

>Assuming your problem is purely lack of experience, here's a word of advice: stay the hell away from abstractions for a year or two.

This is flagrantly offensive. Let me give you a word of advice. Keep your attitude in check, this kind of attitude can ruin your whole career after you get your ass fired. Any hiring manager would prefer a less technical person (which you have not demonstrated any dramatic ability in) over someone with a bad overly domineering attitude, and that is a fact.

>Instead try to form some basic mental model of a) how a computer and b) your main tools (such as python) work (have a very simplified model of how the CPU, memory network and SSD work, with numbers correct to 1 or 2 orders of magnitude; read a bit about the python implementation). Try to predict how much time/memory/etc something should take before you implement it. Then implement it in the most straightforward and direct way.

How does this have anything to do with the topic at hand. You are invoking OS and CPU architectural level concepts which are so far away from python it has basically nothing to do with it. An SSD? Are you serious? All of these low level operations are light years in distance away from python. If you want time and space analysis of both algorithms, at the python level/category theory level you just follow the standard model of Big Oh complexity. You want to get nitty gritty? get the bench marks or switch to something like C++ or Rust. Nowhere does anyone have a detailed enough mental model to map out how the python code will execute at the assembly instruction level.

Category theory is a top down approach to programming while computer architecture is more of a bottom up approach. Experts in both fields are unlikely to collide.

FWIW I double majored in Electrical/Computer Engineering and Computer Science 10 years ago from a top university. Not only do I have a general model of computing from semiconductor physics (this is below logic gates and non linear circuits and computer architecture, all said to patronize you.) all the way up to python, but I hate throwing down credentials for no reason... SO why did I do it here? because you decided to assume that I lack experience.

patrec · on Dec 26, 2019

I timed it -- 2 orders of magnitude slower across the board.

    In [19]: short, middle, long = [{2*i+1:2*i+2 for i in range(n)} for n in [2, 2**8, 2**16]]
    In [20]: timeit  [x for kv in d.items() for x in short]
    812 ns ± 6.86 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    In [21]: timeit  [x for kv in d.items() for x in middle]
    30.1 µs ± 1.87 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    In [22]: timeit  [x for kv in d.items() for x in long]
    9.77 ms ± 1.19 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
    In [24]: timeit  eval(str(short).replace('{','[').replace('}',']').replace(':',', '))
    10.4 µs ± 240 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    In [25]: timeit  eval(str(middle).replace('{','[').replace('}',']').replace(':',', '))
    592 µs ± 23.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    In [26]: timeit  eval(str(long).replace('{','[').replace('}',']').replace(':',', '))
    264 ms ± 3.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Python dictionaries are ordered, and iterate in order (since 3.6, as guaranteed property since 3.7). In older versions of python where they are not ordered, str also does not preserve the order in the literal (except for one of all possible orderings).

   Python 3.7.2 
   Type "help", "copyright", "credits" or "license" for more information.
   >>> str({5:6, 4:5, 3:2})
   '{5: 6, 4: 5, 3: 2}'

   Python 2.7.15
   >>> str({5:6, 4:5, 3:2})
   '{3: 2, 4: 5, 5: 6}'

You might consider what I wrote offensive, but maybe you can still take the above as evidence that there are areas where your mental model is imperfect. I have the impression there is a puzzlingly common class of CS engineering failures stemming from misguided abstraction attempts because people are either unable or just unpredisposed to making extremely back of the envelope task breakdown of the most direct implementation in terms of primitive, costed operations. I also see this happen to rather smart but inexperienced engineers. Since you are not inexperienced my advice is unlikely to be helpful.

crimsonalucard · on Dec 26, 2019

>I timed it -- 2 orders of magnitude slower across the board.

Dude, didn't I tell you to use regular expressions? Replace is obviously slower. I used it because it's easier to read then regexp. Rerun your experiment on that. I specifically mentioned that the regexp is the faster implementation. You completely ignored it.

Again the implementation was not my point. I don't even understand why you would spend the effort to record something so trivial.

Also the "magnitude" difference in speed is not exponential meaning for most applications the difference is negligible. You should know if you're "experienced" that a C++ binary is a "magnitude" faster than python just running a for loop counting to 100. Most people still use python because in the end because it Doesn't even matter.

>Python dictionaries are ordered, and iterate in order (since 3.6, as guaranteed property since 3.7). In older versions of python where they are not ordered,

Good to know.

>str also does not preserve the order in the literal (except for one of all possible orderings).

Categorically wrong. You talk about mental model? Examine yours. The very nature of a string is an ordered list of characters. A string literal "abc" has order and preserves it by nature of having order as a property. A data structure that loses order is one that does not explicitly have ordinal properties hence it does not preserve order.

>You might consider what I wrote offensive, but maybe you can still take the above as evidence that there are areas where your mental model is imperfect.

No one has a perfect model. To understand the world and computing, what people do is they make predictions and guesses and abstractions. I'm just guessing that string space might be faster in python. If I really wanted to understand why one benchmark was faster than the other I basically have to understand the source code behind eval and str. That is something that I don't care too much about and is therefore not worth my effort. Overall that wasn't even my main point.

>I have the impression there is a puzzlingly common class of CS engineering failures stemming from misguided abstraction attempts because people are either unable or just unpredisposed to making extremely back of the envelope task breakdown of the most direct implementation in terms of primitive, costed operations

The back envelope calculation is that the upper bound for both algorithms is O(N) where N is the amount of characters in the serialized data structure; and thus benchmark times are negligible in almost all contexts. Think about it. We have back envelope calculations down to a science (complexity theory) and that science doesn't even reference actual timing because humans can't see the difference between 0.001 ms and 0.001 ns. But you're going around all of the science and throwing bench marks at me. You want to squeeze the maximum amount of time out of your python program? Stop using python... start writing your code in cisc.

>I also see this happen to rather smart but inexperienced engineers. Since you are not inexperienced my advice is unlikely to be helpful.

An experienced engineer is only concerned with performance when it actually makes a difference. Your advice is not helpful because it is wrong.

Understanding computing is in itself composed of layers of abstraction. To optimize a high level language I don't have to understand assembly language. I definitely don't need to understand CPU architecture which you seem to imply is required. I have an abstraction that will help me: Big Oh Notation.

The most common time a programmer will need to understand the low level aspects of a computer is if they are writing systems level code, not application level code. These specialists tend not to use python at all. I would imagine category theory is not applicable at this level because they are closer to the turing model of computing which is by nature non-functional. Assembly, C, C++ or Rust is what they usually work in.

patrec · on Dec 26, 2019

> Dude, didn't I tell you to use regular expressions? Replace is obviously slower. I used it because it's easier to read then regexp. Rerun your experiment on that. I specifically mentioned that the regexp is the faster implementation. You completely ignored it.

Well, I ignored it because I thought that a) the actual string manipulation would form a negligible part of the total runtime b) regexp would not be faster (and the actual fast way to do it would be str.translate). Let's try it:

    In [43]: pat = re.compile('[{}:]'); slong = str(long)
    In [44]: timeit slong.replace('\{','[').replace('}',']').replace(':',', ')
    8.06 ms ± 23.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    In [45]: timeit repl = pat.sub(lambda m: {"{": "[", "}": "]", ":": ", "}[m.group()], slong)
    128 ms ± 5.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    In [54]: timeit eval(repl)
    237 ms ± 3.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    In [54]: timeit slong.translate({'{':'[', '}':']', ':': ','})
    753 µs ± 27.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

See how having a reasonable mental model can be nice?

crimsonalucard · on Dec 26, 2019

Please read the entire post and respond to that rather then the tidbits that help you gain argumentative advantage. I have repeatedly said that the implementation WAS not the point and that all performance metrics are negligible as the are under O(N). Either way here's a response:

Honestly reg expressions in all general contexts is one of the fastest implementation of string parsing available for this specific type of parsing. You would have to understand computing theory to see this. Since you didn't even bring it up the actual fact of the matter is, you don't have a good model of computing in general in your head.

The issue you're seeing here is python specific. So really the only way to see this kind of thing is through benchmarks OR knowing the python source.

> See how having a reasonable mental model can be nice?

The problem is your posts also indicate to me that you don't have a good mental model. From what I can make of the model is this: Abstractions are bad, use less of it for more speed, also know SSD's and CPU architecture that will help you write faster python code.

Then what you do is run benchmarks and rely on that in place of an actual working mental model. Believe it or not you CAN run benchmarks on every permutation of a code path you can forego a model altogether. Evidence is more reliable then a model, yet your genius advice was for me to learn CPU architecture.

>a) the actual string manipulation would form a negligible part of the total runtime b) regexp would not be faster (and the actual fast way to do it would be str.translate)

Two things about this, first... str.translate is python specific. No general mental model would assist you with this. You are using python specific knowledge here.

The second part is similar. How do you know eval would be non-negligible? Theoretically the interpreter interprets python code and eval interprets the same thing. Is eval/str accessing heap space or stack space? What is causing it to be slow or is it the extra parsing itself?

Likely you don't know.

Either way my example served one thing. If you understood it you would know that the theme was basically to say that moving through Another type space could have advantages over moving through the original type space. The string functor was just an example.

I could easily say that the goal was to convert:

   {"1":"2", "3":"4", "5":"6"} to {6:1, 2:3, 4:5}

Or essentially the ordered dict, rotated. Tell me which space is a rotation more intuitive? A list space. [1,2,3,4,5,6] is more readily rotated into [6,1,2,3,4,5] with one operation and converted back into a dict.

If you tried to do the above directly by manipulating the dictionary it would not be as straightforward. Use a functor to lift the dict into a list, do the rotation and lift it back down to a dict.

That is the point and the pattern I am trying to convey. We can argue about benchmarks all day it serves Nothing.