> The decision to use Go was deliberate, because we needed something that had lower latency than Java (where garbage collection pauses are an issue) and is more productive for developers than C, while also handling tens of thousands of client connections. Go fits this space well.
This is interesting. There haven't been many instances yet where Go's memory/GC performance is compared directly and favorably to Java's performance.
In this case I don't think it's the raw performance they are looking for but rather freedom from the unpredictable GC pauses. Although I think they should have explained why they didn't use Azul Zing for this, I realize it's expensive but this seems like the exact use case for it.
We really are looking for raw performance, but not at any cost. I spent a bunch of time, and continue to do so, to eliminate extra work, indirection, and garbage.
Efficiency of runtime and development is a careful balance that Go fits well. Rust is immature. C is difficult for people who haven't been writing it professionally. Java (less Zing or other special cases) is too latent, even if it's very fast when it's running because of HotSpot optimizations. The list goes on, but I didn't see a reason to choose the others over Go, nor have I found something yet that beats it for our use case.
Really, Java just wasn't the language for the job. Go has a lower level and simpler interface, simpler runtime (with less indirection), and better model for concurrency. Yes, it's a new language for us, but it was the right tool for the job. Our garbage collection pauses are mostly under a millisecond, so they aren't really an issue.
Zing never really was on my radar. Looking at it now I shudder to think of the licensing costs, especially since we run tens of thousands of cache instances. We already pay enough just to run these servers. It's also a new Java runtime which very few people know about, and probably nobody at Netflix has used. I would have had to learn it all independently and continue to be on my own from then on. The Go slack team and community in general are very helpful and supportive, and I would speculate that the same level of support isn't available for Zing without paying.
Cool, thanks for the insight! I love reading the thinking process behind design decisions.
On the subject of Azul's price: I think they would give you a high volume discount, but even half the price (1750) per server would be pretty outrageous for your use case.
For my use case I'm going with Apache Ignite right now for tiered off heap caching, but I'm not sure it would work for you guys. Might be worth a look though.
I admire Netflix OSS ecosystem. Thanks for the great work you guys do!
Here's how they word it on the informational page for Zing:
“Zing® is priced on a subscription basis per server (physical or virtual). With per-server pricing, you don’t need to worry about core counts, memory size, or number of instances deployed per server. The single license annual subscription price for Zing is $3500 USD per physical server, with significantly lower prices for higher volumes and longer-term subscriptions. Please contact us to learn about the special pricing available for start-ups and companies with $25 million or less in annual revenue, and for ISVs and manufacturers looking to embed/integrate Zing with their products.”
> Although I think they should have explained why they didn't use Azul Zing for this, I realize it's expensive but this seems like the exact use case for it.
Azul Zing is extremely expensive for commodity hardware; that's a massive price-tag for just the lower latency constraint. Their pricing model is a much better fit for vertically-scaled services, though they even argue that it can lead to reduced costs on horizontally scaled services, presumably on the assumption that you can use fewer hardware instances at peak to handle latency guarantees:
> With Zing, Azul has created the most scalable Java Virtual Machine (JVM) for enterprise workloads and made it available on cost-efficient commodity hardware. With Zing, enterprises can now dramatically simplify Java deployments by using fewer instances while achieving greater response time consistency under load and dramatically lowering operating costs.
Somehow, I don't see a cache solution working well with this pricing model. The underlying operations are presumably simple enough you can get the performance you want with thin, simple software deployed on many commodity machines.
I ask because the pdf you're probably referring to seems to imply the opposite. I only skimmed though but heres what I found that seems to imply the opposite:
"An entire Java heap (large or
small) can be marked consistently without falling behind
and resorting to long garbage collection pauses
because the C4 employs:
• Single pass completion to avoid the possibility of
applications changing references faster than the
mark phase can revisit them
• “Self-healing” traps that ensure a finite workload
and avoid unnecessary overhead
• Parallel marking threads to provide scalability that can
support large applications and large dataset sizes"
Maybe I'm looking at the wrong pdf can you provide a source for your claim? It's something useful to know if it's true.
I am hesitant to post direct link as document says 'Confidential and Proprietary'. But here is something in that doc about memory requirement
`The Zing components require the following amount of RAM on the ZVM machine.
l
64+ GB RAM recommended
16-32 GB RAM minimum, depending on the application memory requirements, for demonstration purposes only. Minimum is not sufficient for performance or scalability production testing of large or memory consuming enterprise applications.
Increase the memory as needed to accommodate the number and size of your Java application/Zing Virtual Machine (ZVM) instances.
The Pool license server requires the following amount of RAM on the license server machine. l 8 GB RAM recommended (4 GB minimum)`
This document is like 300+ pages with all the details about tuning and optimizations which make me think that it is not simple plug and play once you have paid for license.
I think the bigger thing is the deliberate design of go for parallel processing. I feel like the Go GC is still an issue, although getting better every day, but the benefits in regards to "handling tens of thousands of client connections" potentially make the GC easier to swallow than Java.
But that's is just my outside looking in take on it.
Java uses about 2-10 times more memory most of time compare to Go. So I'd think Java's GC has to be at least 10 times better/sophisticated than Go's. Though I doubt if it is really 10 times better.
As a Python programmer, I am intimately familiar with the distinction between concurrency (Coroutines) and Parallelism...and additionally the difference between multi-threaded and multi-processing within Parallelism.
It was my understanding that Go's center around concurrency made parallelism also easier to approach than most languages:
The link redirects to a https URL that doesn't work (Firefox and Opera).
I'm not using a https anywhere addon so I was very surprised that wget http://techblog.netflix.com/2016/05/application-data-caching... doesn't get redirected. Loading the downloaded file in the browser let me read the article.
I get the same result in Firefox and in Chrome it says This site can't be reached with a specific ERR_CONNECTION_CLOSED error. I've been wondering why this item kept getting voted up but I couldn't even access it. I'm on Time Warner Cable in southern Kentucky by the way, could this be a network issue?
Edit: I should note that I cannot access this via HTTP either as I'm getting redirected to HTTPS.
Edit 2: Correction: If I force http:// in Chrome I can access the blog entry. I tried even using Private mode in Firefox and while I specify http:// the URL gets modified to https:// and I get the insecure connection message. Above I used the word redirect but I can see no evidence of an actual redirect in the developer interface. But it's not clear to me if the Firefox development interface Network tab would indicate a redirect response.
This is interesting. There haven't been many instances yet where Go's memory/GC performance is compared directly and favorably to Java's performance.