*What do you mean by coordination being limited in Erlang, exactly?* I have an a...

vertex-four · on April 19, 2014

That should be reasonably possible, surely? Your global grid is one process, that sends messages containing the smaller parts to the processing processes, then waits for them to reply with another message, and then aggregates the replies. It's all just message buses, you can push whatever stuff you want around however you want.

Imagine how you'd implement your system if each part of the system had to communicate with the others over a network socket, and you're most of the way to implementing it in Erlang.

It's probably not optimal to do things involving lots of calculations in Erlang, though, as the VM is fairly slow. Akka implements a similar system, but for the JVM (Scala).

stcredzero · on April 19, 2014

waits for them to reply with another message, and then aggregates the replies

Basically, you are proposing that the world actor aggregate the entire world's data? That wouldn't scale. Or, maybe the world actor just becomes responsible for adjudicating moves between subgrids. Maybe.

MetaCosm · on April 19, 2014

Isn't the fundamental point of concurrency the players?

Couldn't you have X players moving on a global grid? I have built systems with literally tens of millions of players (Erlang processes) moving on a grid (and doing way more, localized threat detection, decision making, moving away or towards, and coordination with other units in X range).

stcredzero · on April 20, 2014

Can you give a high level description of how you implement the global grid? It's really only absolute coordination of a checkerboard-like grid that I'm wrestling with. I've verified that my algorithm works, and is nicely stable. I've also scrutinized it with VisualVM, and on Linux and OS X, the threads are either in wait or running, in the pattern I'd expect. (The subgrid worker-group threads have to wait for the global grid to do its coordinating.) I'm also seeing expected use of the thread pools. However, for some reason I can't seem to scale beyond 250 users.

One complication is that my grid is for a procedurally generated world with 2^120 locations in it. This is why I generate subgrids. A degenerate case is one subgrid per user. However, these subgrids are organized in load-balanced groups, each of which has their own thread pool, caches, and locks.

Also, rollbacks are problematic, though the real problems are arguably corner cases.

Erlang might be a win because each garbage collector only has to deal with its own local memory.

EDIT: It turns out my algorithm is somewhat similar to Pikko:

http://www.erlang-factory.com/upload/presentations/297/Pikko...

One big difference, is that my algorithm doesn't move or reconfigure masts, instead it dynamically creates subgrids, which are then grouped into "workgroups" each of which is supposed to be processed by a different CPU socket. Instead of there being an API, it's more that the subgrids stop what they are doing and their information is briefly managed by the global grid code. (The procedurally generated map is rigged, so that there are many opportunities for crossing from one subgrid to another.)

MetaCosm · on April 20, 2014

Going to do my best to put a quick summary (from memory) on something that took us a long time to get right and was a bit convoluted.

Our units actually would report to intermediaries their maximum interaction boundries, which would then be passed to processes to create something somewhat like a mast, someone like a subgrid -- a dynamic interaction zone. All our units had hard constraints (max speed, etc) and worked in global ticks that represented real time. Then, we would talk to global to stretch all the interaction zones to fill empty space and report back boundaries. Then, our units would work in little worlds until they crossed a threshold, we caused a rezoning among them and their neighbors. So initially, the everything would have to be parsed out into interaction zones, but then they could ignore each other for periods of times until a unit strayed across an edge, and then rezoning took place.

Not sure how well it would work with amped up movement (making it have to go all the way up to global more) and not certain how it would work at the scale of 1 undecillion 329 decillion 227 nonillion 995 octillion 784 septillion 915 sextillion 872 quintillion 903 quadrillion 807 trillion 60 billion 280 million 344 thousand 576 points!

stcredzero · on April 20, 2014

I woke up with the realization that my problem is probably GC pressure. The system is currently written in idiomatic Clojure so doing everything generates garbage. What's more, the garbage is created in one tick, then released in a subsequent tick, so I'm losing the benefit of generational GC.

I think I can make my subgrids algorithm work, but it will have to be using mutable data structures.