Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hi, do you have real use experience running elasticsearch with 64g+ heap?

Is there any articles/benchmark/notes or anything that you would be willing to share?

We have considered trying out 64g+ heaps for our cluster but we are concerned about very long gc pauses impacting the search performance.



Currently we are running ES on big iron in production.

There are both advantages and disadvantages. On plus side:

- easier to manage several big machines than shitload of small instances potentially on top of virtualization solution (debugging overlay network etc. at night is not fun)

- nodes are more resilient to big requests (both big bulk indexing and resource-heavy searches). Your risk of hitting sudden OOM is practically zero (although ES does have circuit-breakers that try to prevent processing requests that would cause OOM)

- while not having compressed oops is sad, not having multiple copies of JIT and compiled code cache etc. is a plus

- using a modern concurrent GC is more sensible on bigger heaps. G1 is actually fine on 64g. Some of us are running Shenandoah on Java 13 in production, I’m looking forward to apply it on my clusters (or get back to testing ZGC).

Disadvantages:

- NUMA. Try to pick more recent Java as it’s tuned to run better on NUMA machines, still not all of JVM is NUMA-aware

- fault domain is larger and getting hot node up to speed (in-sync) after restart can easily take about an hour. Getting it up from empty storage going to take a bloody while

- elastic.co folks seem to have recommended settings for lots of small nodes not big ones, so you are on your own to discover proper limits for every setting


Thanks for this info!

Doing a quick googling on elasticsearch and ZGC I just found this https://github.com/elastic/elasticsearch/issues/44321 where the official response from elastic is that only G1GC and CMS are tested and supported.

I still think that it will be worth trying Shenandoah or ZGC if/when we ever get down to do some experiments with larger heaps.


Last time I ran with ZGC the main problem was that ES OOM circuit-breakers are going nuts since they are not prepared to have memory mmaped 3 times. So their calculations are off by a factor of 3 blocking most of requests for no good reason.

Running with oom circuit-breakers disabled is something I’m currently considering.


Java now has pausless GC.


It’s highly concurrent but not pauseless


Most production systems are barely on JDK 8. G1GC is the most used GC for high-performance production systems at many large companies (1B+ USD revenue) I have worked for.


I highly recommend trying out Java 11. G1 got quite a few improvements since 8.

In particular full GC is parallel since 10: https://openjdk.java.net/jeps/307


Thanks, trying out is not an option for so many things. It is up to vendors to decide which JVM to recommend and I cannot overrule them. As of personal use, I am on 11.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: