I apologize for the somewhat off-topic nature off this comment, but it really irritates me when I see the website for a product with competition, go to their sitemap, and see this blatant SEO gaming. For example, I went to their FAQ (https://www.dragonflydb.io/faq) expecting things about their database, instead I get a bunch of copycat answers to weird questions intended to drive traffic to their site when people ask questions about other databases. I see many application sites doing this nowadays, and it makes me not want to use them. I imagine with GPT this is very easy to generate now
> As an AI language model, I am unable to make subjective judgments on whether Memcached is "good" or not. However, I can provide information about the features, benefits, and limitations of Memcached.
Because we focus on building the most
awesome datastore and you should decide whether you prefer having your store designed by good marketeers or good engineers.
"Willing to take shortcuts of questionable ethics" is not a sign of a good engineer either. EDIT: ah, and now I see you also defending your shitty benchmarks downthread. Yeah, no, that's not how you gain trust.
Is that a valid reason to asses a tech product? Does it matter how good or bad they are at content and SEO?
From this bad episode alone, how can we judge if they are bad at keeping the data safe/secure?
I am trying to remember when I last checked Redis' or Clickhouse's homepage or other pages on their site. I only care about their documentation pages and if they deliver what they are promising.
All, My name is Oded Poncz, I am the CEO of DragonflyDB. Our goal with the FAQ was to aggragate valuable information for our community. We will take all the valuable feedback from this thread and improve. Thank you all for the feedback.
Strange, your link to the FAQ gives me a 404, but existing links to specific FAQ sections, posted in the comments below, still work. On the website I can only find this: https://www.dragonflydb.io/docs/about/faq. Maybe they took down the original FAQ?
1. We removed the FAQ page
2. You are right, I need to shut the fuck up and let self-righteous hn crowd with torches do what they do best - find a weakness, and push it until they get bored and switch to beat another builder. You asked me about my perspectives and priorities? These are my priorities: https://github.com/dragonflydb/dragonfly/graphs/contributors
3. The first thing I did was on this thread was to apologize for having FAQ with such content.
The best thing about showing anything to others is seeing things about it that you didn't see before. That can be flaws or a hidden gem, but either way you know more about what you have than you did before. You might feel that the focus on the FAQ page was irrelevant and irritating but it gave you feedback on something that at least a few people found relevant about your marketing. Even if you feel like it was just "find a weakness, and push it" I still hope you understand that you got something out of it and hopefully feel OK with that.
Spam in the context of SEO clearly means something more similar to https://en.wikipedia.org/wiki/Spamdexing or unhelpful/dishonest/unrelated content that is just there for ranking.
Are you disputing that the FAQ page tries to rank your site higher by having autogenerated answers to questions your site is not authoritative on? Are any of the questions on the FAQ actually written or meant to be read by a human?
We all know that the definition has expanded in recent years to include low-effort or bot-generated content being flogged on social media (eg: one of Reddit's report options is "spam") and in search results.
Either way, if you want to keep a positive image going, don't take internet comments personally. Just incorporate the feedback and make changes if you feel it's warranted.
I don't know how well the results hold up after a year, but according to Redis the Dragonfly performance test is biased and with Redis configured properly it reached higher throughput than Dragonfly. YMMV but just putting this up here. Personally I never used Dragonfly so I wonder if the "marketing metrics" actually hold up in production.
Redis Cluster has lot of limitations though. It's unusable for multi-key operations, no scan, no transactions, single database only, client has to support it, the way it works makes it unusable when connecting to it outside the network (see https://redis.io/docs/management/scaling/) etc. At that point RedisCluster is not Redis anymore and it's disingenuous to call it that. I would rather have slightly lower performance and not have to deal with those limitations AND not have tot deal with orchestration.
The Dragonfly benchmark runs one Redis instance on a 64-CPU machine and compares it with one Dragonfly instance on the same machine.
But there is nothing stopping you from running 64 Redis instances on one machine if it has 64 cores, which is what Redis did (actually, they ran just 40). That actually seems like a nicer design overall, as it scales "naturally" to multiple machines without any extra effort/code, it keeps the code simpler, you can also have one of these Redis instances segfault without bringing your entire cache down.
Other than that, they seem to have run the same benchmark. YMMV for other types of workloads of course, and perhaps Dragonfly could be configured better in some way.
Either way: it seems the Dragonfly benchmark is not just biased, but highly misleading. And while the Redis benchmark may be biased, it certainly doesn't seem highly misleading.
To me, spinning up multiple copies of the database is cheating. You're comparing a box of Apples to a single apple.
Yes, using a Redis cluster is the only way to get Redis to actually use system resources effectively, but its a relatively complex thing to create and manage compared to just running 1 server.
I don't think it's cheating at all; it's how its designed to work.
If you want to say "but this is more difficult": okay, fair enough (although in my experience it's not difficult at all), but then say that instead of posting a misleading benchmark which runs Redis in a way it's not supposed to run. You can place all sorts of artificial "yeah but I don't want to do it like this" constraints on all sort of things.
hmm and ships are designed to sail, yet you use planes to cross atlantic.
Nokia was designed as strongest and most affordable phone, yet you use Iphone that costs 1000$. it's not about how it was designed but whether it addresses your current needs. Developers do not want to manage a cluster of single cpu processes. Not on their laptops and not in the production. And it's not just about management complexity. See this https://github.com/dragonflydb/dragonfly/issues/1229 and it's just one example. Single cpu - is just not enough for today use-cases.
That may all very well be the case – let us assume it is for the sake of the argument although I have some comments about that as well – but that still means the argument is "Redis is too complex to run on multiple CPUs" and/or "Redis is poor for these workloads" (I didn't investigate that issue in-depth), and not "Redis is unable to do much work with this very powerful AWS instance". There two are very different things. There is no nuance anywhere in the benchmark. A reader might very well believe that this is all the performance they're going to get out of Redis on that machine, which that's clearly not the case.
> Nokia was designed as strongest and most affordable phone, yet you use Iphone that costs 1000$
You are an exception, then :)
But I still stand by the claim that fragmenting your stateful workload (i.e. Redis) into bunch of processes instead of having a single endpoint per instance is an acceptable approach in 2023. When your processes are excessively tiny, their load variability overshadows their average load. This imbalance results in unpredictable pauses, latencies, and Out of Memory (OOM) issues. This primarily occurs due to the absence of resource pooling under a single process. While it's challenging to exhibit this issue via synthetic benchmarks, it's certainly present.
I think you forgot "not" there before "an acceptable approach in 2023".
These are all fair and reasonable opinions to have, and to some degree I even agree with it, but none of that is captured in the rather simplistic benchmark. Everyone understands that even with the best of efforts it's hard to capture everything in a benchmark, but in this case it's just missing a very obvious way to run Redis.
It's like benchmarking PostgreSQL connections and coming to the conclusion there is no way PostgreSQL can handle more than n connections and that OtherSQL is much better. Is this true? Yes. But it's also true that half the world is running pg_bouncer and that this is widely seen as the way to run PostgreSQL if you need loads of connections. Is it a pain you need to run this and something that should be addressed in PostgreSQL? Absolutely. Such a benchmark would be correct in a strict narrow technical sense, but at the same time also misrepresentative of the real-world situation.
I understand what you are saying. How would you suggest to present it then?
Dragonfly is not faster than Redis when running on a single cpu. It can not be, just because it has the overhead of the internal virtualization layer that composes all the operations over multiple shards (in general case). But Dragonfly can scale vertically with low latency and high throughput unlike other attempts of making multi-threaded Redis that used spinlocks or mutexes. So how do we demonstrate the added value?
> But Dragonfly can scale vertically with low latency and high throughput unlike other attempts of making multi-threaded Redis that used spinlocks or mutexes. So how do we demonstrate the added value?
Provide more advanced benchmarks which demonstrates those types of differences better.
The situation is that the differences are complex, both in terms of performance and operationally (e.g. running multiple instances is not a huge obstacle, but it is harder). That's always going to be hard to capture in a single graph or a single tagline; I appreciate this isn't easy.
It's your website; you can do what you want with it. And maybe I'm just a grumpy old curmudgeon who has seen too many hype cycles, but to me it just comes off as "too good to be true" – which it kind of is – and leaves a more negative than positive impression. The same applies to "The most performant in-memory data store on Earth" tagline, which seems a bit hyperbolic (what is "fastest" depends, as you mentioned that Redis will always be faster on a single core – some people only need a single core!)
I have the business acumen of a goat, so what do I know? But it seems to me that a lot of people appreciate when products are straight-forward about their weaker points as well, and even straight-up say they're not the best fit for all scenarios, and in that in the long run this is more beneficial.
> To me, spinning up multiple copies of the database is cheating.
What if the database was designed to be run that way?
> You're comparing a box of Apples to a single apple.
Precisely. Dragonfly is a box of apples. Redis is a single apple that can be put in a box with other apples. If you run a "benchmark" comparing your box of apples against a sole apple, you're being either stupid or dishonest.
At least on AWS it is kind of hard to get 40 tiny VMs with sufficient speed on the infra side. Given that laptops get 40+ vCores these days, I think a single instances od anything should have some multi threading.
Licensed Work: Dragonfly including the software components, or any portion of them, and any modification.
Change Date: March 15, 2028
Change License: Apache License, Version 2.0, as published by the Apache Foundation.
Additional Use Grant: You may make use of the Licensed Work (i) only as part of your own product or service, provided it is not an in-memory data store product or service; and (ii) provided that you do not use, provide, distribute, or make available the Licensed Work as a Service. A “Service” is a commercial offering, product, hosted, or managed service, that allows third parties (other than your own employees and contractors acting on your behalf) to access and/or use the Licensed Work or a substantial set of the features or functionality of the Licensed Work to third parties as a software-as-a-service, platform-as-a-service, infrastructure-as-a-service or other similar services that compete with Licensor products or services.
It stops companies from taking it for free. Sure, you could find senior developers experienced in building performant and reliable databases to build one for you in many months. But you need to commit the resources to that and you'll be playing catch-up to what dragonflydb achieved in the meantime.
It's not open at all. I can't offer this as a service with my non profit org, because the non profit still needs a 10€ annual fee to exist. I can't make improvements that truly belong to the community. This is an attempt to fight off capitalism with private property, which means it only tries to fight off capitalism that doesn't benefit them. If you're going that way, might as well close source it.
AGPL allows all of that, allows you to run it as a service, allows you to earn money, and if amazon ever reuses it you have access to their differentiating sauce so the playing field is leveled profiting everyone, including you. Most likely though amazon won't use it, which is also the original goal.
> I can't offer this as a service with my non profit org
You mean make a hosted dragonfly a service you provide? (I'm not sure why the €10 matters - maybe I'm missing something) Yeah, that's exactly the point of that licence. You can do it in 5 years if you want.
> You mean make a hosted dragonfly a service you provide?
Yes, why not ?
> (I'm not sure why the €10 matters - maybe I'm missing something)
That's a standard fee members pay to cover expenses the org may have
> Yeah, that's exactly the point of that licence. You can do it in 5 years if you want.
Good! At least there's a recognition of not being of public service and not doing this for everyone but only for themselves. I wish they chose another license, but it's a start
That's one specific case out of all possible uses that's not allowed with this licence. It honestly doesn't sound like much of a limitation. It would also be an extremely weird service to run as a non-profit.
> That's a standard fee members pay to cover expenses the org may have
Yes, but why did you mention it? It seems irrelevant to the question of what service you can build.
"I can't use it for one particular use case that >99.999...% of users never want to run in the first place, therefore it's not open at all and might as well be closed source" is quite the take.
Why bother then? Using this as a component makes you enter the "a substantial set of the features or functionality of the Licensed Work to third parties as a software-as-a-service, platform-as-a-service, infrastructure-as-a-service or other similar services that compete with Licensor products or services" zone, and it can be interpreted in any way. How do you make sure your product isn't a derivative and is different enough for lawyers?
This only really applies if you're offering direct access to the Dragonfly wire protocol or something along those lines, not if you're merely using Dragonfly it as part of your product. Building, say, a Twitter clone on Dragonfly is perfectly fine. You're interpreting this exceedingly broad.
We do not fight off capitalism in any way. Rather, I view capitalism as a powerful engine driving innovation. My experience as an engineer at both Google, a massively profitable entity that contributes significantly to the open source community, and AWS, which effectively utilizes open source software to bolster its own success, has shaped this perspective. I harbor no resentment towards AWS; instead, I see both approaches as legitimate strategies within the scope of a capitalist system. I firmly believe that companies in the open source space need to rise to the challenge of competition and devise effective strategies to thrive in the marketplace.
Weirdly, in-memory data storage is barely the bottleneck most of the time.
Why do we need faster storage? We need better models & domains that improve our developer experience. Not something faster. We don't need cars that go 650km/h. We need better roads and superior security controls
Blazingly fast wasn't enough; now we add "on Earth" to be impactful enough to capture our attention
I use redis with rspamd and have systems that scan millions of mails per day and redis is a bottle neck for some of the modules. I’ve been impressed with dragonfly for the reputation set which has stopped the timeouts I was seeing and was able to replace a multi redis set up using rspamd internal hash/sharding to a single dragonflydb. I plan to replace more over but need stream support which is coming. The dev is also responsive on their discord channel.
Presume the upper bound of single digit millions (9) for just over 100 emails/second. Also presume a rather pedestrian single instance (single core) Redis limit of 100k/ops/sec.
Are people really running software that is making 1k Redis calls/message?
First let me say redis has served me well for years. Before rspamd I used spamasassassin - another perfectly fine piece of software. Having more options is a good thing. For redis I started with single instances and moved to master (for writes), slaves (for reads), or sharding/hashing depending on what rspamd uses. I moved one section so far to dragonflydb called reputation module. This is performing better for my usage.
As for email, if email was broken down and nicely distributed over the same period I likely would have no issues. However, there are times where there is much more scanning going on, and a good 4 hours of peak time. This is where issues can crop up. If rspamd scan can't get a return fast enough, that test will be skipped. It still works, but I'd rather have all tests go through. So lets then take a single email and where redis is used
*Ratelimit checking
* Replies plugin
* Some multi map are stored in redis
*mx check module
* neural module (a bit like bayes but automatic and short term)
* fuzzy module used from spam traps
* history module - for redis admin, stats
So a single email is doing more than a single redis query for each email. The mail server (for smtp), which is not rspamd, is also using redis on its own.
Now imagine a spike of emails, perhaps a large spamming activity that is being stopped. Needing to scan 100 emails in a second isn't impossible at peak. The scanning is also for incoming and outgoing email.
Their blog post goes into detail on how they are benchmarking and they use the benchmarking tool developed by Redis [1]
> We ran our tests on AWS Graviton2 EC2 instances, which are network-optimized and provide the best performance for web applications. We used memtier_benchmark — a widely used benchmarking tool developed by Redis Ltd. — to test throughput and latency, and Prometheus to monitor and visualize memory usage.
It is easy to reproduce and check the claims. Redis also countered with a blog post [2]
Once one moves to dragonfly cloud, I guess the "performant" part of the value prop falls by the wayside. i.e. you need your in-memory data store on the same machine (or on the same rack) otherwise really un-peformant solutions can out-perform dragonfly cloud.