Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> If you subject "out-of-process cacheing technologies" to the same measurements (performance cost, code complexity), as threading solutions, what do you find?

Yes, I think you find that shared state is difficult.

So it's the old process (private by default, shared as the exception) versus threads (shared by default) split. imho, "less shared state" == "simpler". And imho, "private by default" == "less shared state".

And you have an easier time of it the shared state helps you with concurrent access. This is one benefit of RDBMSs (transactions) and also systems like redis. The fact that your state is external means it is more likely to have been designed to be conncurrency-safe.



I don't get this argument. If I replace my memcache logic with an in-memory thread-safe hashmap, it will always be faster. I can't see how it can be slow.

As for 'simpler', as long as you aren't actually implementing the thread safe map, it is simpler than using memcache too.

So as long as you use well written shared state implementations, (e.g. Java's concurrent collections), shared state isn't as hard as you make it out to be.


> If I replace my memcache logic with an in-memory thread-safe hashmap, it will always be faster. I can't see how it can be slow.

Because in a threaded application context, you have to either:

- have locking/synchronization on all data structures (performance cost)

OR

- manage which of your data structures are shared and which are not (complexity cost/race bugs)

With external, shared state (RDBMs, memcached, etc) you get fast in-process access to your (private) data structures (no performance cost due to locking) and a (mostly) concurrency-safe, explicit datastore.

And as pointed out elsewhere, to scale beyond a single process you need the external state anyway.

There are other approaches to this, software transactional memory, functional programming, clojure paradigm etc. But it's a hard problem and it's not clear these are the right solution at scale.


manage which of your data structures are shared and which are not

That is exactly what you are doing when you decide which of your data structures go in memcache and which don't.


> That is exactly what you are doing when you decide which of your data structures go in memcache and which don't.

Yes, apart from the fact that the ones you don't think about are private. i.e. processes are "private by default" and threads are "shared by default".

Processes give you a safe default, threads give you a dangerous one.

This is the main (only) difference between threads and processes - and it is the important one.


So you concede Antrix's point that mutlithreading is not slower?

I'd just like to confirm that you are even able to determine when your argument has been refuted, as you seem to think that that the proper response is to just pretend it didn't happen and come up with a new reason instead. That is to say, are you a reason factory supporting of an internally held belief despite evidence to the contrary, or are you a rational sentient who is in a discourse for the purpose of determining a common truth.


> So you concede Antrix's point that mutlithreading is not slower?

Slower than what?

I said (and you quoted): threading doesn't come "for free", in that you need to pay a performance cost in terms of locking in your interpreter and/or a code complexity cost in terms of access to your shared data structures

For your reference, I stand by that comment and don't think I've said anything to contradict it. For clarity, this is what I think:

1) a multithreaded (fine-grained locking) interpreter will run a single-threaded workload slower than an interpreter with a GIL (reference from elsewhere in this thread): http://www.artima.com/weblogs/viewpost.jsp?thread=214235 http://mail.python.org/pipermail/python-dev/2001-August/0170...

2) the threading programming model imposes a complexity burden on the programmer, since all data structures are shared by default and so they must think about every data structure and whether it can become shared in practice (and so must be concurrency-safe or not)

Basically - either your interpreter locks everything for you (perf cost) or you have to worry about it (complexity cost) or a bit of both.

I don't think that threaded access to an in-process data structure is slower than multi-process access to memcached.

I do think that defaulting to private data and having explicitly shared data is wise and is an easier programming model.

I hope that's clear. Please let me know if you think I've been inconsistent, rude or done anything other than espouse these points in this thread.

ps. I found your last reply rude. Also I'm not trying to say "threads are bad and you are bad for using them". I'm also not attacking your (or anyone else's) integrity.


Ok, its the former then.

Slower than what?

The great thing about HN, and similar discussion systems you'll find on the internet, is that you can read the conversation. So when I say "slower", and reference "antrix", a sentient (or even a reasonable AI) could infer that I was referring to this statement, by antrix:

>If I replace my memcache logic with an in-memory thread-safe hashmap, it will always be faster. I can't see how it can be slow.

And that you replied to by quoting it.

Clearly, by reading the english therein, antrix is comparing the memcache logic with a thread-safe hashmap. So in case you still aren't getting it, the answer to your question "Slower than what?" is "than a thread-safe hashmap". That you are not aware that this was antrix's assertion would explain why your subsequent posts fail to refute it.

So your behavior is not merely that of a response factory, but a response factory with a 1 deep context buffer.


The reason I asked "slower than what", is because at no point have I claimed that going to memcached would be faster than a local hash (with locking).

Whereas I have (in this thread) claimed that an interpreter without a GIL (and with fine grained locking) would be slower than one with a GIL (for single-threaded workloads).

I wanted to know which you meant.

You haven't shown (I believe because it's not there) where I claimed that going to memcached would be faster.

And you're being rude and trying to provoke a reaction.

And HN is hiding the reply link because it's heuristics have determined that the signal/noise of these posts is likely to be low.

And I agree and so won't reply further in this thread.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: