Because it's more fun to learn a new programming language than deliver features to users. We're VC financed so no need to get money, also spend it before people talk about profitability, then the good times are over (see Etsy). We also can add Pony to our CV and move on in one year to the next company where we will introduce the next big thing to add to our CVs. Plus 10% more salary! Kaching! #LivingTheLife
It might be a bit early to accuse Walaroo of wasting money into "fun for coders" and ignoring the market's actual needs.
When I read their blog, I get the feeling that they are thinking things through. They seem to target a specific market and focus on what can make their product desirable. And yes, they are definitely experimenting on a few things but they are well aware of the trade off: it appears to be a calculated risks, and might very well pay off in the near future.
Given what I know of Pony, WhatsApp and Erlang, it reminds me of WhatsApp's decision to write their backend in Erlang.
And it is a risk - big whoop, you've written production code in Pony. Good luck doing that again elsewhere, or hiring someone competent in it. I don't think it's quite hip or popular enough to be very useful on your resume (but I could be wrong).
If keep doing stuff because its done elsewhere we stop innovating.
> it reminds me of WhatsApp's decision to write their backend in Erlang.
WhatsApp's decision to use erlang is based upon totally valid points. In fact Erlangs concurrency model made scaling so easy for them, they only needed 50 engineers overall for handling 50B messages a day. [0]
You might not know Pony, that doesn't mean it couldn't make sense for someone else.
Your comment and the parent aren't mutually exclusive. It's fairly easy for coders to provide vast amounts of technical justification for resume driven development.
Is a new immature language like Pony really that desirable in the marketplace? I highly doubt 'resume development' was ever a primary motivation.
If it was anything beyond the strategic interests of the company it's that developers love a) clean code and possibilities of a fresh new project and b) playing with new modern toys.
The characterisation of JVM GC behavior seems slightly unfair.
Saying that JVMs are 'stop the world' and Pony is 'concurrent' feels like it's ignoring modern JVM GC strategies.
It might be true that you can avoid stop the world collection for an actor-system, but by logical extension would that not be possible on the JVM as well for that particular workload, given a suitably designed actor-system?
Its possible yes. At the time we started working on Wallaroo, the only real option for concurrent GC on the JVM was Azul. And we didn't want to tie our product and our goals to another company's commercial offering.
My intent was not be unfair. There's a lot of nuance in the topic that can be hard to cover in a more general blog post. Garbage collection is a fascinating topic, there's a great amount of detail that is left out in that post. I was going to a broad overview of general thinking.
I think this statement is a reach. There are plenty of concurrent collectors for the JVM, and there are tons of architectural and coding strategies to mitigate inconvenient GCs. This is not a one-of-a-kind problem, this is well studied and the path is well trodden. Azul is a convenient solution that wouldn't involve having to do a whole lot of case study, but it's certainly not the only one. And this is as-if perfect deterministic latency was the only factor that was important (then why not use C?).
What it sounds like instead was very minimal benchmarks or science was performed ahead of time, then a lot of justification written afterward. I know that's a reach as well, but this is a well-trodden path, so the answer "Pony is the best possible solution for the interest of our business" just seems like a very strange conclusion.
I see the argument/utility in using the right tool for the job, even if it was a slightly new language, more than most people. But I have a higher tolerance for risk more than most people (and have experienced the downside costs of those choices, they are very real).
Chosing newer languages gets maligned far more often than pigeonholing the wrong old languages onto problems.
Besides someone's got to take risks with (potentially) better technology, as long as they know the risks and fully considered them going in, then by all means. Plus as long as we continue to use C for all systems development the longer we'll have preventable security issues.
I've read that blog post and some of your other posts on HN. I get why the JVM, C/C++, and Go were not fits. However, I have not seen a lucid explanation of why you didn't go with Erlang or Elixir.
We looked at Erlang. Several of us are friends with folks who worked at Basho on Riak and we talked with them about our performance goals. They were very skeptical that we could meet them using Erlang. Based on that, we moved on from Erlang.
It's worth posting the long version. I've been following the various Pony blog posts with interest (I'm a both language geek and a distributed systems geek), but I always come away with the notion "Huh, kinda cool, but why didn't they just use Erlang, it'd be a great fit for this".
So either:
a) Erlang is not a good fit, and I'm wrong. Then I'd really like to know why I'm wrong!
b) Your friends at Basho led you astray. Would also be interesting to know what happened in this case!
Either way, without knowing more details, the short version you just posted is inconsistent with the claim that you guys did serious research into existing language ecosystems before going your own way.
Erlang, while having many virtues, is simply slow.
Once, I reimplemented in Elixir a toy data science tool I had previously built in node. Idiomatic node, idiomatic Elixir, both written for readability. The Elixir was approximately 100 times slower than the node version.
Now Erlang often feels fast, because of the architectures it allows, but when you get down to shuffling bytes around or doing low level math it is currently slow, slow, slow.
Given Wallaroo's speed goals, I would have been really surprised had they used Erlang:
I mean really though you shouldn't be using BEAM languages for scientific and computational tasks. For starters, there isn't a native array type (everything is lists). That's fine, because lists can give you flexibility while preserving immutability guarantees when passing across functions.
If you're doing an n-body simulation, then this benchmark is a good benchmark for deciding whether or not to use erlang/elixir. If you're doing a server which is mostly parsing JSON inputs over HTTP and spitting out more JSON with HTTP, and needs to handle thousands or millions of parallel connections without hiccuping, is an nbody simulation the right thing to use as your benchmark reference?
It's well known Erlang is not suited for data science and number crunching. I'm curious why you even bothered, near every introductory guide I've read makes this clear.
So that's not a good example of Erlang/Elixirs performance which is hardly known to be 'slow'. The language and process/actor model is far faster than many other languages particularly in the web space.
The author's also mentioned heavy dependency on the actor model as a performance optimizing strategy and optimal code structure which is why it likely is worth fully exploring for the OPs problem.
I too would love to hear a long-form answer to this. Erlang seems like a good fit for this problem, besides maybe packaging up the client in an easily usable fashion.
"this problem" is very broad and there are aspects of it that Erlang is indeed a very good fit for. There are aspects where it is less so.
I think this is a rather in-depth conversation where HN comments aren't the most productive mechanism. If either or both of you are interested in chatting more on this, my email is sean@wallaroolabs.com. Drop me an email and we can arrange a time to chat.
Have you ever made a decision to not use Pony (or whatever the new sexy tech)?
I've read a few blog posts now and the end result always seems to be Pony.
You even talk up how great the C and Java client libraries are. Well they can't be as great as you say or you would have used them.
The C library seems to perform better, be more featureful, and tested better. So once again, why? It certainly can't be because you library is going to top the C library in any way. The article even seems to imply you'd be happy being at parity with the C lib.
In regards to the C and Scala/Java client libraries. They are great for what they do and how they do it. However, that doesn't mean they're ideal in every scenario. For example, the Scala/Java client is the most feature rich client and is actively developed in sync with the Kafka brokers. This, however, doesn't make it suitable for embedding in other languages. As a result, the C client was created by the community and is now officially supported by Confluent. That doesn't in any way take away from the quality of the Scala/Java client though.
Also, while the C client is more featureful and better tested, there is still the concern regarding the thread pools internal to Pony and librdkafka. We've seen first hand how CPU cache invalidation can impact performance so we are very aware of the potential negatives if the Pony and librdkafka threads ever end up fighting with each other over the same CPU resources and would prefer to avoid that.
Yes, Pony Kafka is currently slower than the C client. But it is also almost completely untuned as of right now. We expect there is a lot of low hanging fruit on that front that will give us significant gains. Yes, we mention in the blog post that we would be happy at being parity with the C client but our goal has always been to exceed it, eventually. Both in terms of performance and features.
I'm coming from this as somebody who often has to rewrite a lot of library code because of certain performance issue and poor decisions from library writers often regarding things like garbage collection and hidden resources, like thread pools or an event loop, that cannot be hooked into. I see it all the time. I can no longer count the number of times I've had to rewrite parts of the JDK or networking libraries because of these issues.
Now, this is what I'm hearing from what you are saying:
> 1- We can't use a JVM implementation because we aren't using a JVM language.
Makes sense.
> 2- The C library is okay, but hides its thread pool with no way to access it.
Ugh. Hate that. Its like these people writing these have never had to use them in a real project. The sign of a mediocre library.
Pony's actor model might have to rewrite almost any library used by it when concurrency is involved.
But now, I think you answered your own question in the titles now:
> Why we wrote our Kafka Client in Pony
1- Because the C library is mediocre and hides its threads from users making it not very useful for high-performance applications.
2- Because the rest of the system is in Pony. Really, you could write it in C/C++ or even Rust as long as you wrote it in a way that played well with Pony's concurrency model, but why bother with that extra effort, especially if you believe - as you seem to - that Pony's concurrency story is superior.
Once we made the decision to use Pony for Wallaroo, that has driven a lot of our other choices. The Java and C client libraries are excellent. We had architectural concerns about how the thread pool in the clients would interact with our scheduler threads.
There's a large performance improvement we get by having a single scheduler thread for each CPU. The performance impact of that is very large. Adding another threadpool that competes for CPU usage would be problematic.
Our client is for those high-performance use cases where if we can get parity or close to parity with the C client then we should get much better performance due to those architectural concerns.
That said, we plan on providing a way for folks who are less concerned with performance to use the C client library.
In the end, it was less about "use Pony" and more about "do this in a way that matches with Wallaroo's architecture".
Sure, but why should it be accomplished without Pony? Languages are optimized for use-cases. This means that some languages are good and some are worse at handling particular use-cases. If Pony is the best choice for their use-case, why would you not choose it? Taking all the risks of a new tech into account, of course.
If no one takes the plunge how do languages af technologies ever get proven? A startup without bureaucracy, institutional legacy and technical debt seems like a good place to do it.
I don't know about "hipster" unless you're using it as a synecdoche for "trend following."
There is definitely a predilection in certain parts of the coder community to prefer newness and difference over tried and true. There isn't anything wrong with that necessarily: it's part of how progress is made. However I think it's often taken to extremes in the coder community.
Also, picking a right tool for the job can be a real advantage, which makes it easier to deliver features.
Getting so close to C implementation (in terms of speed) with Pony is actually insane if you look at the number of guarantees Pony gives you. Next time you dereference a NULL pointer please remember that it's impossible in Pony. Oh, and next time you spend a week debugging some hairy locking issue, consider that issue wouldn't happen in Pony at all. EDIT3: removed EDIT1 from here.
Currently, Pony is in direct competition with Go (but uses the other concurrency model) and Erlang/Elixir (but is natively compiled). People and companies frequently choose Go or Erlang, so I don't really understand why they shouldn't choose Pony if their use-case fits.
EDIT2: And here I am getting downvoted... I wonder, is anything I wrote not true?
>so I don't really understand why they shouldn't choose Pony if their use-case fits.
The difference between Pony and Go or Erlang (or even Elixir) is that the Pony team still are making breaking changes to the language. That means that your dev team may need to spend time to update features due to breaking changes in the language. Also, the ecosystem isn't there like it is for Go or Erlang.
Yeah, but both Go and Elixir (Erlang less so - commercial and internal PLs work a bit differently) were in the same situation at some point: very small ecosystem, small community, lots of changes to the language. Adopting a language at this stage of evolution has a set of very well-known risks, but it has to be done by someone for the language to ever reach maturity. Trying to use it seriously is one of the best ways to contribute to the language.
In any case, if you are aware of the risks and plan to mitigate them - by, for example, employing people capable of debugging and fixing the language's implementation - you're left with some risk and a lot of advantage (if you're lucky and your domain is indeed the one your language is best suited for). It's a gamble, of course, but then nearly every decision (other than buying IBM) is one.
Those are excellent points that we considered when we went with Pony. So far, we feel it has worked out well. A large number of those breaking changes have originated with us at Wallaroo Labs so they've been pretty easy for us to stay on top of.
Actor style concurrency exists for many other language platforms. See Akka for JVM/Scala or Seastar for C++.
I'm somewhat skeptical of GC pauses being a problem in anything that's not actual hard real time like avionics or manufacturing equipment or similar. What difference will a few hundred millisecond pause even make in an distributed async data pipeline? And that's on the higher end of pauses these days.
Sure, I understand pauses happen and there is a performance degradation, I'm asking whether it really matters in a processing framework that isn't controlling medical equipment or airline hydraulics. Is something going to break if there's a small pause? Especially in return for the productivity and safety of using managed runtimes?
> cherry-picked negative aspects of using other languages and irrelevant details
I don't understand? Why are they irrelevant? They're basically Pony's reason for existing... Cheap, efficient and safe concurrency, coupled with very high level of type-safety, is the main selling point on Pony. It is much better on these counts than many other languages. Maybe I shouldn't have mentioned Python and Ruby, I'll edit the post.
> It's the aggressive tone
I see. Compared with the charming politeness of the OP comment, I must have sounded really rude. I apologize.
I don't know if 5-10% difference in write speed and 75% in read speed is "so close to C implementation." These are both operations that will be happening thousands/millions/+ times per day, so it feels incorrect to say they're close.
These read/write differences will compound to make the rest of the data processing pipeline slower.
> Yes, Pony Kafka is currently slower than the C client. But it is also almost completely untuned as of right now. We expect there is a lot of low hanging fruit on that front that will give us significant gains.
>There is also the secondary concern regarding the thread pools internal to Pony and librdkafka. We've seen first hand how CPU cache invalidation can impact performance so we are very aware of the potential negatives if the Pony and librdkafka threads ever end up fighting with each other over the same CPU resources.
For the first, proof-of-concept, no-optimizations-applied, version of a low-level library written from scratch in a high-level language to be anywhere near production-ready, presumably optimized C implementation is actually very impressive. I'd expect the new implementation to be an order of magnitude (or more) slower than the C one, initially, and only get better with many rounds of optimizations in the following months.
So wait, engineering has no value, progress should only come from research departments at big corporations, we should never engage in anything outside of product and ad work, and anyone who applies research is stealing time from their employers.
Maybe, just maybe, we're starting to demand more of career software engineers. Maybe, just maybe, it's time for folks to stop grousing about how value-driven they are and realize that bad engineering is more expensive than good engineering over even medium term timescales.
To me Pony looks like a mix between Kotlin/Ceylon and Rust and at the same also happens to pickup all the low hanging fruit of programming language features that most "modern" languages haven't even attempted to pick up yet.
Rust has some flaws. Kotlin has some flaws. Pony appears to have even less flaws than either.
Of course I have merely read the tutorial. In practice it could be worse than either.
I often describe Pony and Rust as fellow travelers. They're following roughly the same path, but they have different constraints, and so they've made different choices that are suitable for what they need.
Yep, we don't want to talk about this...and maybe it's unfair this particular example is getting focused on...but yes, it does have many of the hallmarks of the software industry's dirty little secret: we like shiny new stuff, and we've gotten really good at convincing ourselves and others that novelty is not holding an undue influence in our decisions.
I think it starts to get interesting when you substitute "program" for "programming": because it's more fun to write a new program than deliver features to users with an old program. Now fun looks more suspicious.
You know HN is written in ARC, right? I'll just jump to the end: they did it for the features, and having fun just happens.
Because it's more fun to learn a new programming language than deliver features to users. We're VC financed so no need to get money, also spend it before people talk about profitability, then the good times are over (see Etsy). We also can add Pony to our CV and move on in one year to the next company where we will introduce the next big thing to add to our CVs. Plus 10% more salary! Kaching! #LivingTheLife