It works great even for large heap sizes. I moved my ES cluster (running with ar...

pron · on March 23, 2021

Whether G1 or ZGC are the best choice depends on the workload and requirements, but G1 in recent JDK versions also requires virtually no tuning (if your G1 usage had flags other than maximum heap size, maybe minimum heap size, and maybe pause target, try again without them).

JD557 · on March 23, 2021

>running with around 92G heap size

I'm curious about this choice. The elasticsearch documentation recommends a maximum heap slightly below 32GB [1].

Is this not a problem anymore with G1GC/ZGC, or are you simply "biting the bullet" and using 92G of heap because you can't afford to scale horizontally?

1: https://www.elastic.co/guide/en/elasticsearch/reference/7.11...

legerdemain · on March 23, 2021

Heaps "slightly below 32GB" are usually because of the -XX:+UseCompressedOops option, which allows Java to address up to 32GB of memory with a smaller pointer. Between 32-35GB of heap, you're just paying off the savings you would have gotten with compressed object pointers, but if you keep cranking your heap further after that, you'll start getting benefits again.

JanecekPetr · on March 23, 2021

This, exactly. One added issue is that ZGC does NOT support compressed oops at all.

capableweb · on March 23, 2021

> because you can't afford to scale horizontally?

Doesn't have to be because of affordance but rather it's more efficient and cheaper to scale vertically first, both in monetary costs and in time/maintenance costs.

vosper · on March 23, 2021

On hardware, but not on a cloud setup? We run several hundred big ES nodes on AWS, and I believe we stick to the heap sizing guidelines (though I’ve long wondered if fewer instances with giant heaps might actually work ok, too)

toast0 · on March 23, 2021

Cloud is trickier to price than real hardware. On real hardware, filling the ram slots is clearly cheaper than buying a second machine, if ram is the only issue. If you need to replace with higher density ram, sometimes it's more cost effective to buy a second machine. Adding more processor sockets to get more ram slots is also sometimes more, sometimes less cost effective than adding more machines. Often, you might need more processing to go with the ram, which can change the balance.

In cloud, with defined instance types, usually more ram comes with more everything else, and from pricing listed at https://www.awsprices.com/ in US East, it looks like within an instance type, $ / ram is usually consistent. The least expensive (per unit ram) class of instances is x1/x1e which are 122 Gb to 3904, so that does lean towards bigger instances being cost effective.

Exceptions I saw are c1.xlarge is less expensive than c1.medium, c4.xlarge is less than other c4 types and c4 is more expensive than others, m1.medium < m1.large == m1.xlarge < m1.small, m3.medium is more expensive than other m3, p2.16xlarge is more expensive than other p2, t2.small is less expensive than other t2. Many of these differences are a tenth of a penny per hour though.

manasvi_gupta · on March 23, 2021

Please specify Elasticsearch & JDK version. Also, index size and heap size per node.

From my experience, high heap sizes are unnecessary since Lucene (used by ES) has greatly reduced heap usage by moving things off-heap[1].

[1] - https://www.elastic.co/blog/significantly-decrease-your-elas...

vosper · on March 23, 2021

How (and how much) did these improvements manifest? For example, did you measure consistently faster response times when running ZGC rather than G1GC? If so, by how much? I’m always looking for a way to improve ES response times for our users.

pradeepchhetri · on March 24, 2021

We mainly capture GC metrics and alert on them. One good thing that happened is there is no longer GC related alerts happening in production anymore. Also tail latency for API calls from kibana to ES improved.

gher-shyu3i · on March 23, 2021

Did you notice a change in the peak memory usage?