Have you ever made a decision to not use Pony (or whatever the new sexy tech)?
I've read a few blog posts now and the end result always seems to be Pony.
You even talk up how great the C and Java client libraries are. Well they can't be as great as you say or you would have used them.
The C library seems to perform better, be more featureful, and tested better. So once again, why? It certainly can't be because you library is going to top the C library in any way. The article even seems to imply you'd be happy being at parity with the C lib.
In regards to the C and Scala/Java client libraries. They are great for what they do and how they do it. However, that doesn't mean they're ideal in every scenario. For example, the Scala/Java client is the most feature rich client and is actively developed in sync with the Kafka brokers. This, however, doesn't make it suitable for embedding in other languages. As a result, the C client was created by the community and is now officially supported by Confluent. That doesn't in any way take away from the quality of the Scala/Java client though.
Also, while the C client is more featureful and better tested, there is still the concern regarding the thread pools internal to Pony and librdkafka. We've seen first hand how CPU cache invalidation can impact performance so we are very aware of the potential negatives if the Pony and librdkafka threads ever end up fighting with each other over the same CPU resources and would prefer to avoid that.
Yes, Pony Kafka is currently slower than the C client. But it is also almost completely untuned as of right now. We expect there is a lot of low hanging fruit on that front that will give us significant gains. Yes, we mention in the blog post that we would be happy at being parity with the C client but our goal has always been to exceed it, eventually. Both in terms of performance and features.
I'm coming from this as somebody who often has to rewrite a lot of library code because of certain performance issue and poor decisions from library writers often regarding things like garbage collection and hidden resources, like thread pools or an event loop, that cannot be hooked into. I see it all the time. I can no longer count the number of times I've had to rewrite parts of the JDK or networking libraries because of these issues.
Now, this is what I'm hearing from what you are saying:
> 1- We can't use a JVM implementation because we aren't using a JVM language.
Makes sense.
> 2- The C library is okay, but hides its thread pool with no way to access it.
Ugh. Hate that. Its like these people writing these have never had to use them in a real project. The sign of a mediocre library.
Pony's actor model might have to rewrite almost any library used by it when concurrency is involved.
But now, I think you answered your own question in the titles now:
> Why we wrote our Kafka Client in Pony
1- Because the C library is mediocre and hides its threads from users making it not very useful for high-performance applications.
2- Because the rest of the system is in Pony. Really, you could write it in C/C++ or even Rust as long as you wrote it in a way that played well with Pony's concurrency model, but why bother with that extra effort, especially if you believe - as you seem to - that Pony's concurrency story is superior.
Once we made the decision to use Pony for Wallaroo, that has driven a lot of our other choices. The Java and C client libraries are excellent. We had architectural concerns about how the thread pool in the clients would interact with our scheduler threads.
There's a large performance improvement we get by having a single scheduler thread for each CPU. The performance impact of that is very large. Adding another threadpool that competes for CPU usage would be problematic.
Our client is for those high-performance use cases where if we can get parity or close to parity with the C client then we should get much better performance due to those architectural concerns.
That said, we plan on providing a way for folks who are less concerned with performance to use the C client library.
In the end, it was less about "use Pony" and more about "do this in a way that matches with Wallaroo's architecture".
Sure, but why should it be accomplished without Pony? Languages are optimized for use-cases. This means that some languages are good and some are worse at handling particular use-cases. If Pony is the best choice for their use-case, why would you not choose it? Taking all the risks of a new tech into account, of course.
I've read a few blog posts now and the end result always seems to be Pony.
You even talk up how great the C and Java client libraries are. Well they can't be as great as you say or you would have used them.
The C library seems to perform better, be more featureful, and tested better. So once again, why? It certainly can't be because you library is going to top the C library in any way. The article even seems to imply you'd be happy being at parity with the C lib.