Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Dragonfly – Performant in-memory data store (dragonflydb.io)
77 points by jlokier on May 21, 2023 | hide | past | favorite | 80 comments


I apologize for the somewhat off-topic nature off this comment, but it really irritates me when I see the website for a product with competition, go to their sitemap, and see this blatant SEO gaming. For example, I went to their FAQ (https://www.dragonflydb.io/faq) expecting things about their database, instead I get a bunch of copycat answers to weird questions intended to drive traffic to their site when people ask questions about other databases. I see many application sites doing this nowadays, and it makes me not want to use them. I imagine with GPT this is very easy to generate now


> I imagine with GPT this is very easy to generate now

That's exactly what they are doing, but if they had any brains, they'd have a human read the answers before they publish:

https://www.dragonflydb.io/faq/is-memcached-good

> As an AI language model, I am unable to make subjective judgments on whether Memcached is "good" or not. However, I can provide information about the features, benefits, and limitations of Memcached.


This is tragically funny.

If the dragon fly devs are reading this, stop this nonsense seo gaming. This makes me trust your db less and cheapens everything that you stand for.


We are reading this and we will fix it shortly.


Is your definition of fixing to remove the most obvious indicators that you autogrenerated spam or actually removing the spam?


You removed the specific answer but did not address the underlying problem that this thread is about.


Seems to be fixed


It's... patched up to avoid embarrassment. It will be fixed when they remove the SEO spam.


Wow, this is really embarassing. I'm a bit shocked that they thought it is ok to autogenerate a FAQ with an AI. It's dishonest and irresponsible.


Why should I trust someone with my data that can't even manage this lmao


Because we focus on building the most awesome datastore and you should decide whether you prefer having your store designed by good marketeers or good engineers.


"Willing to take shortcuts of questionable ethics" is not a sign of a good engineer either. EDIT: ah, and now I see you also defending your shitty benchmarks downthread. Yeah, no, that's not how you gain trust.


Is that a valid reason to asses a tech product? Does it matter how good or bad they are at content and SEO?

From this bad episode alone, how can we judge if they are bad at keeping the data safe/secure?

I am trying to remember when I last checked Redis' or Clickhouse's homepage or other pages on their site. I only care about their documentation pages and if they deliver what they are promising.



All, My name is Oded Poncz, I am the CEO of DragonflyDB. Our goal with the FAQ was to aggragate valuable information for our community. We will take all the valuable feedback from this thread and improve. Thank you all for the feedback.


Is there a timeline for a clustered, HA and horizontally scaled version?

The vertical scaling and performance seem extremely interesting, but for our use cases we need HA.

From a business perspective I’m interested to hear about the plans. Will there be a per core licensing fee of some sort?


HA already supported since v1.0. Dragonfly is released under BSL1.1. Its free for use in most cases. https://www.dragonflydb.io/docs/about/faq


Its a bit surprising, the faq is all about Redis and other tools but not about Dragonfly.


Strange, your link to the FAQ gives me a 404, but existing links to specific FAQ sections, posted in the comments below, still work. On the website I can only find this: https://www.dragonflydb.io/docs/about/faq. Maybe they took down the original FAQ?


You are absolutely right, and I apologize for having this experience. SEO is part of the game, but it should not be in FAQ section.


This is the kind of thing that I hope search engines will penalize you for. You're treading a very fine line between "SEO" and outright deception.


They are not threading the line. This is outright autogenerated spam.


This is definition of spam: https://www.merriam-webster.com/dictionary/spam To the best of my knowledge this page was not sent to anyone.


After being caught behaving badly, a debate about the proper terminology for your lapse in ethical judgment is not a winning move.

It might suggest that your perspectives and priorities are still wonky.

I would prioritize immediately and permanently removing the material and the practice, over posting questionable whataboutism arguments vs. spam.


1. We removed the FAQ page 2. You are right, I need to shut the fuck up and let self-righteous hn crowd with torches do what they do best - find a weakness, and push it until they get bored and switch to beat another builder. You asked me about my perspectives and priorities? These are my priorities: https://github.com/dragonflydb/dragonfly/graphs/contributors 3. The first thing I did was on this thread was to apologize for having FAQ with such content.


The best thing about showing anything to others is seeing things about it that you didn't see before. That can be flaws or a hidden gem, but either way you know more about what you have than you did before. You might feel that the focus on the FAQ page was irrelevant and irritating but it gave you feedback on something that at least a few people found relevant about your marketing. Even if you feel like it was just "find a weakness, and push it" I still hope you understand that you got something out of it and hopefully feel OK with that.


Absolutely. And I am sorry I reacted this way.


Spam in the context of SEO clearly means something more similar to https://en.wikipedia.org/wiki/Spamdexing or unhelpful/dishonest/unrelated content that is just there for ranking.

Are you disputing that the FAQ page tries to rank your site higher by having autogenerated answers to questions your site is not authoritative on? Are any of the questions on the FAQ actually written or meant to be read by a human?


We all know that the definition has expanded in recent years to include low-effort or bot-generated content being flogged on social media (eg: one of Reddit's report options is "spam") and in search results.

Either way, if you want to keep a positive image going, don't take internet comments personally. Just incorporate the feedback and make changes if you feel it's warranted.


That's not Google's definition of search spam, as I suspect you will soon find out.


I don't know how well the results hold up after a year, but according to Redis the Dragonfly performance test is biased and with Redis configured properly it reached higher throughput than Dragonfly. YMMV but just putting this up here. Personally I never used Dragonfly so I wonder if the "marketing metrics" actually hold up in production.

https://redis.com/blog/redis-architecture-13-years-later/


Redis Cluster has lot of limitations though. It's unusable for multi-key operations, no scan, no transactions, single database only, client has to support it, the way it works makes it unusable when connecting to it outside the network (see https://redis.io/docs/management/scaling/) etc. At that point RedisCluster is not Redis anymore and it's disingenuous to call it that. I would rather have slightly lower performance and not have to deal with those limitations AND not have tot deal with orchestration.


Yeh but isn’t the Redis one just a biased?

What might have been interesting would be to test on a range of cores / clusters, and consider the overhead of managing 1VM vs 64VMs etc.


The Dragonfly benchmark runs one Redis instance on a 64-CPU machine and compares it with one Dragonfly instance on the same machine.

But there is nothing stopping you from running 64 Redis instances on one machine if it has 64 cores, which is what Redis did (actually, they ran just 40). That actually seems like a nicer design overall, as it scales "naturally" to multiple machines without any extra effort/code, it keeps the code simpler, you can also have one of these Redis instances segfault without bringing your entire cache down.

Other than that, they seem to have run the same benchmark. YMMV for other types of workloads of course, and perhaps Dragonfly could be configured better in some way.

Either way: it seems the Dragonfly benchmark is not just biased, but highly misleading. And while the Redis benchmark may be biased, it certainly doesn't seem highly misleading.


To me, spinning up multiple copies of the database is cheating. You're comparing a box of Apples to a single apple.

Yes, using a Redis cluster is the only way to get Redis to actually use system resources effectively, but its a relatively complex thing to create and manage compared to just running 1 server.


I don't think it's cheating at all; it's how its designed to work.

If you want to say "but this is more difficult": okay, fair enough (although in my experience it's not difficult at all), but then say that instead of posting a misleading benchmark which runs Redis in a way it's not supposed to run. You can place all sorts of artificial "yeah but I don't want to do it like this" constraints on all sort of things.


hmm and ships are designed to sail, yet you use planes to cross atlantic. Nokia was designed as strongest and most affordable phone, yet you use Iphone that costs 1000$. it's not about how it was designed but whether it addresses your current needs. Developers do not want to manage a cluster of single cpu processes. Not on their laptops and not in the production. And it's not just about management complexity. See this https://github.com/dragonflydb/dragonfly/issues/1229 and it's just one example. Single cpu - is just not enough for today use-cases.


That may all very well be the case – let us assume it is for the sake of the argument although I have some comments about that as well – but that still means the argument is "Redis is too complex to run on multiple CPUs" and/or "Redis is poor for these workloads" (I didn't investigate that issue in-depth), and not "Redis is unable to do much work with this very powerful AWS instance". There two are very different things. There is no nuance anywhere in the benchmark. A reader might very well believe that this is all the performance they're going to get out of Redis on that machine, which that's clearly not the case.

> Nokia was designed as strongest and most affordable phone, yet you use Iphone that costs 1000$

Actually I have a Nokia :-)


You are an exception, then :) But I still stand by the claim that fragmenting your stateful workload (i.e. Redis) into bunch of processes instead of having a single endpoint per instance is an acceptable approach in 2023. When your processes are excessively tiny, their load variability overshadows their average load. This imbalance results in unpredictable pauses, latencies, and Out of Memory (OOM) issues. This primarily occurs due to the absence of resource pooling under a single process. While it's challenging to exhibit this issue via synthetic benchmarks, it's certainly present.


I think you forgot "not" there before "an acceptable approach in 2023".

These are all fair and reasonable opinions to have, and to some degree I even agree with it, but none of that is captured in the rather simplistic benchmark. Everyone understands that even with the best of efforts it's hard to capture everything in a benchmark, but in this case it's just missing a very obvious way to run Redis.

It's like benchmarking PostgreSQL connections and coming to the conclusion there is no way PostgreSQL can handle more than n connections and that OtherSQL is much better. Is this true? Yes. But it's also true that half the world is running pg_bouncer and that this is widely seen as the way to run PostgreSQL if you need loads of connections. Is it a pain you need to run this and something that should be addressed in PostgreSQL? Absolutely. Such a benchmark would be correct in a strict narrow technical sense, but at the same time also misrepresentative of the real-world situation.


I understand what you are saying. How would you suggest to present it then? Dragonfly is not faster than Redis when running on a single cpu. It can not be, just because it has the overhead of the internal virtualization layer that composes all the operations over multiple shards (in general case). But Dragonfly can scale vertically with low latency and high throughput unlike other attempts of making multi-threaded Redis that used spinlocks or mutexes. So how do we demonstrate the added value?


> But Dragonfly can scale vertically with low latency and high throughput unlike other attempts of making multi-threaded Redis that used spinlocks or mutexes. So how do we demonstrate the added value?

Provide more advanced benchmarks which demonstrates those types of differences better.

The situation is that the differences are complex, both in terms of performance and operationally (e.g. running multiple instances is not a huge obstacle, but it is harder). That's always going to be hard to capture in a single graph or a single tagline; I appreciate this isn't easy.

It's your website; you can do what you want with it. And maybe I'm just a grumpy old curmudgeon who has seen too many hype cycles, but to me it just comes off as "too good to be true" – which it kind of is – and leaves a more negative than positive impression. The same applies to "The most performant in-memory data store on Earth" tagline, which seems a bit hyperbolic (what is "fastest" depends, as you mentioned that Redis will always be faster on a single core – some people only need a single core!)

I have the business acumen of a goat, so what do I know? But it seems to me that a lot of people appreciate when products are straight-forward about their weaker points as well, and even straight-up say they're not the best fit for all scenarios, and in that in the long run this is more beneficial.


> To me, spinning up multiple copies of the database is cheating.

What if the database was designed to be run that way?

> You're comparing a box of Apples to a single apple.

Precisely. Dragonfly is a box of apples. Redis is a single apple that can be put in a box with other apples. If you run a "benchmark" comparing your box of apples against a sole apple, you're being either stupid or dishonest.


At least on AWS it is kind of hard to get 40 tiny VMs with sufficient speed on the infra side. Given that laptops get 40+ vCores these days, I think a single instances od anything should have some multi threading.


The comment you replied to explicitly said (so you don't even have to read the redis article, which also clearly says so)

> The Dragonfly benchmark runs one Redis instance on a 64-CPU machine and compares it with one Dragonfly instance on the same machine.

They were not running 40 tiny VMs!


The should have chosen a 1024 core box and really shocked the world.


> and consider the overhead of managing 1VM vs 64VMs

They clearly are not running 64 VMs in the test they are describing.

They compare both databases on one VM of the exact same size, both deployed as their makers recommend to deploy them.


The v1.3.0 version was released 3 days ago.

https://github.com/dragonflydb/dragonfly/releases/tag/v1.3.0

If you want to integrate it, you also need to check the Software License.

https://github.com/dragonflydb/dragonfly/blob/main/LICENSE.m...

Dragonfly Business Source License 1.1

License: BSL 1.1

Licensor: DragonflyDB, Ltd.

Licensed Work: Dragonfly including the software components, or any portion of them, and any modification.

Change Date: March 15, 2028

Change License: Apache License, Version 2.0, as published by the Apache Foundation.

Additional Use Grant: You may make use of the Licensed Work (i) only as part of your own product or service, provided it is not an in-memory data store product or service; and (ii) provided that you do not use, provide, distribute, or make available the Licensed Work as a Service. A “Service” is a commercial offering, product, hosted, or managed service, that allows third parties (other than your own employees and contractors acting on your behalf) to access and/or use the Licensed Work or a substantial set of the features or functionality of the Licensed Work to third parties as a software-as-a-service, platform-as-a-service, infrastructure-as-a-service or other similar services that compete with Licensor products or services.

...


Sounds fair enough to me. Look what AWS did to Elasticsearch (amongst many others).


Does it really stop anything?

Amazon etc can just "take inspiration" from it and develop something from scratch that's almost the same


It stops companies from taking it for free. Sure, you could find senior developers experienced in building performant and reliable databases to build one for you in many months. But you need to commit the resources to that and you'll be playing catch-up to what dragonflydb achieved in the meantime.


And people are still mocking Stallman.


What do you mean specifically? This sounds like a pretty cool way to balance the open/commercial aspect to me.


It's not open at all. I can't offer this as a service with my non profit org, because the non profit still needs a 10€ annual fee to exist. I can't make improvements that truly belong to the community. This is an attempt to fight off capitalism with private property, which means it only tries to fight off capitalism that doesn't benefit them. If you're going that way, might as well close source it.

AGPL allows all of that, allows you to run it as a service, allows you to earn money, and if amazon ever reuses it you have access to their differentiating sauce so the playing field is leveled profiting everyone, including you. Most likely though amazon won't use it, which is also the original goal.


> I can't offer this as a service with my non profit org

You mean make a hosted dragonfly a service you provide? (I'm not sure why the €10 matters - maybe I'm missing something) Yeah, that's exactly the point of that licence. You can do it in 5 years if you want.


> You mean make a hosted dragonfly a service you provide?

Yes, why not ?

> (I'm not sure why the €10 matters - maybe I'm missing something)

That's a standard fee members pay to cover expenses the org may have

> Yeah, that's exactly the point of that licence. You can do it in 5 years if you want.

Good! At least there's a recognition of not being of public service and not doing this for everyone but only for themselves. I wish they chose another license, but it's a start


> Yes, why not ?

That's one specific case out of all possible uses that's not allowed with this licence. It honestly doesn't sound like much of a limitation. It would also be an extremely weird service to run as a non-profit.

> That's a standard fee members pay to cover expenses the org may have

Yes, but why did you mention it? It seems irrelevant to the question of what service you can build.


"I can't use it for one particular use case that >99.999...% of users never want to run in the first place, therefore it's not open at all and might as well be closed source" is quite the take.


Why bother then? Using this as a component makes you enter the "a substantial set of the features or functionality of the Licensed Work to third parties as a software-as-a-service, platform-as-a-service, infrastructure-as-a-service or other similar services that compete with Licensor products or services" zone, and it can be interpreted in any way. How do you make sure your product isn't a derivative and is different enough for lawyers?


This only really applies if you're offering direct access to the Dragonfly wire protocol or something along those lines, not if you're merely using Dragonfly it as part of your product. Building, say, a Twitter clone on Dragonfly is perfectly fine. You're interpreting this exceedingly broad.


We do not fight off capitalism in any way. Rather, I view capitalism as a powerful engine driving innovation. My experience as an engineer at both Google, a massively profitable entity that contributes significantly to the open source community, and AWS, which effectively utilizes open source software to bolster its own success, has shaped this perspective. I harbor no resentment towards AWS; instead, I see both approaches as legitimate strategies within the scope of a capitalist system. I firmly believe that companies in the open source space need to rise to the challenge of competition and devise effective strategies to thrive in the marketplace.


Weirdly, in-memory data storage is barely the bottleneck most of the time.

Why do we need faster storage? We need better models & domains that improve our developer experience. Not something faster. We don't need cars that go 650km/h. We need better roads and superior security controls

Blazingly fast wasn't enough; now we add "on Earth" to be impactful enough to capture our attention


I use redis with rspamd and have systems that scan millions of mails per day and redis is a bottle neck for some of the modules. I’ve been impressed with dragonfly for the reputation set which has stopped the timeouts I was seeing and was able to replace a multi redis set up using rspamd internal hash/sharding to a single dragonflydb. I plan to replace more over but need stream support which is coming. The dev is also responsive on their discord channel.


1M req/day is 12 req/s. If you can't pull event 1k req/s out of it I'd assume it's being used incorrectly.


A single email doesn't have to mean a single redis query.


Presume the upper bound of single digit millions (9) for just over 100 emails/second. Also presume a rather pedestrian single instance (single core) Redis limit of 100k/ops/sec.

Are people really running software that is making 1k Redis calls/message?


First let me say redis has served me well for years. Before rspamd I used spamasassassin - another perfectly fine piece of software. Having more options is a good thing. For redis I started with single instances and moved to master (for writes), slaves (for reads), or sharding/hashing depending on what rspamd uses. I moved one section so far to dragonflydb called reputation module. This is performing better for my usage.

As for email, if email was broken down and nicely distributed over the same period I likely would have no issues. However, there are times where there is much more scanning going on, and a good 4 hours of peak time. This is where issues can crop up. If rspamd scan can't get a return fast enough, that test will be skipped. It still works, but I'd rather have all tests go through. So lets then take a single email and where redis is used

*Ratelimit checking

* Replies plugin

* Some multi map are stored in redis

*mx check module

* neural module (a bit like bayes but automatic and short term)

* fuzzy module used from spam traps

* history module - for redis admin, stats

So a single email is doing more than a single redis query for each email. The mail server (for smtp), which is not rspamd, is also using redis on its own.

Now imagine a spike of emails, perhaps a large spamming activity that is being stopped. Needing to scan 100 emails in a second isn't impossible at peak. The scanning is also for incoming and outgoing email.


Yeah I appreciate there is a lot going on. But is it really 1k ops/message?

Even at 100 ops/message you should be able to do > 1k messages/sec without breaking a sweat.


Parent said "millions of mails per day", so > 1M req / day I assume.


> Weirdly, in-memory data storage is barely the bottleneck most of the time.

Yeah but sometimes it is and its a way better dev experience to scale up than out.


There was a great HN post of DragonFly vs. some other databases: https://news.ycombinator.com/item?id=31796311.

I am not sure if the latest version improves performance but those are some bold claims of Dragonfly.


other related thread in the HN (today)

"KeyDB – A Multithreaded Fork of Redis (keydb.dev)"

https://news.ycombinator.com/item?id=36018149


I instantly confused this with DragonFlyBSD.


That is a huge claim, almost screams red flag.


I never trust metrics from a company anymore, see the bloodbath comment thread on the recent Kafka vs Redpanda for proof.


Their blog post goes into detail on how they are benchmarking and they use the benchmarking tool developed by Redis [1]

> We ran our tests on AWS Graviton2 EC2 instances, which are network-optimized and provide the best performance for web applications. We used memtier_benchmark — a widely used benchmarking tool developed by Redis Ltd. — to test throughput and latency, and Prometheus to monitor and visualize memory usage.

It is easy to reproduce and check the claims. Redis also countered with a blog post [2]

[1] - https://www.dragonflydb.io/blog/scaling-performance-redis-vs...

[2] - https://redis.com/blog/redis-architecture-13-years-later/


How does performance look like on a vCPU machine Vs Redis? Cause this looks interesting.

I'm most interested in latency to store the coordinates of thousands of players in real time. Well definitely check this out


Sorry, but I find "Redis API compatible" checked for Redis itself funny. :)


Once one moves to dragonfly cloud, I guess the "performant" part of the value prop falls by the wayside. i.e. you need your in-memory data store on the same machine (or on the same rack) otherwise really un-peformant solutions can out-perform dragonfly cloud.


> high-performance, low-complexity, and built for scale.

So, exactly like redis?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: