> We must choose between the lesser of two evils, which is at-least-once delivery in most cases. This can be used to simulate exactly-once semantics by ensuring idempotency or otherwise eliminating side effects from operations.
There is a third option besides idempotency and eliminating side-effects: give each message a unique ID, use that to keep a record of which messages have been processed, and don't process the same message twice.
But I wouldn't get too worked up over it. The article is basically saying: You can't have exactly-once delivery (unless you take the necessary steps to ensure you have exactly once delivery).
Well, it's one way of implementing a kind of idempotency. But idempotency in general is more complicated than just deduping messages. For example, "Toggle the power state" is not idempotent because the result state depends on the initial state. You might think that "turn the power on" is idempotent, and by itself it is, but in conjunction with "turn the power off" it is not because the order in which they are processed matters. A truly idempotent message would be something like, "Insure that the number of on-off cycles at time T1 is N, at time T2 is N+1" etc.
Idempotency is in general more powerful and more complicated than deduping.
Sure, but you said that assigning unique ID's to messages and not executing the same ID twice was a "third option", besides establishing idempotency. I'm saying it's not a "third option", but instead it literally is "establishing idempotency".
I guess we'll just have to agree to disagree about that.
Here is a quote from the original article that supports my position:
"Therefore consumer applications will need to perform deduplication or handle incoming messages in an idempotent manner. ... The way we achieve exactly-once delivery in practice is by faking it. Either the messages themselves should be idempotent, meaning they can be applied more than once without adverse effects, or we remove the need for idempotency through deduplication."
How does this work? This means the consumers have to somehow know the IDs of every message that's been sent and if it's been processed.
Either you have a shared database and all consumers are local, in which case why are you passing messages at all, or you have a distributed system somehow. If you have a distributed system, you've got this problem, and going recursive probably won't help much.
It depends what you mean by “processed”. If there’s some particular action you want to take in response to the message, then you put the database next to wherever that action takes place.
If it’s a notification displaying on a phone, the phone holds a DB that tracks what notifications have been shown.
You can’t reliably show a notification on exactly one of a user’s devices. But that’s no biggie; display it exactly once per device, remove it everywhere when acknowledged anywhere.
That sounds like at-least-once delivery with recipient systems tracking known received messages. Which is to say faking exactly-once with the power of idempotency, rather than exactly-once.
This is academic with enough abstraction, but when you're designing a system and implementing the work of processing a message it can be a pretty important distinction. Especially if you're past the point where the list of seen items becomes a chokepoint for synchronization between consumers.
> You can’t reliably show a notification on exactly one of a user’s devices. But that’s no biggie; display it exactly once per device, remove it everywhere when acknowledged anywhere.
Potentially messy. Now you have two distributed system messages - the initial notification and the ack.
I don't understand why "fake" exactly-once delivery is fake. If I make you generate a random UUID in your system, then store the UUID in a centralised place in my system and ignore duplicate messages with that UUID, and you keep retrying until you get an ACK, why is that not exactly-once delivery?
Is it because the UUID is "only" random with some collision chance? (But then how come a sequential ID from your system wouldn't count?) Is it because my system needs to trust your system? (But why is trust a factor here?) Is it because my system needs a centralised database? (But our two systems are still distributed when you consider them together, right?) Is this a semantic argument over the meaning of "delivery", where I'm not allowed to impose requirements or check a database because the message has already been "delivered"? (But then why are we quibbling over semantics?) Is it because the message broker becomes stateful? (But why is that a constraint?)
I think from looking at the article that this is about delivery within a finite number of retries, but that seems like the kind of problem where in the real world we just ring each other when a message has been retried 50 times over the course of two days.
It's fake because the deduplication isn't a property of the messaging system, it's a property of the system consuming the messages. "Your system" is providing a workaround (duplication) for at-least-once delivery, not the message system.
I think it's also worth noting that your suggested workaround has actually recreated the problem it's intending to solve. If you mark the UUID as "received" as soon as you get a message, how do you deal with duplicates if the processing fails? If you mark the UUID as "received" when you're done processing, how do you deal with the possibility that you'll receive the message multiple times? This can get hairy very quickly.
We come back to the original argument, which is that at-least-once delivery of idempotent operations is how you represent things that should happen exactly once in a system, and in a saner world this could reasonably be called exactly-once delivery.
Every distributed systems engineer knows what "exactly-once delivery" is asking for, and in plain English it's valid to conflate the two, but for some reason the field has decided to treat the phrase as an annoying semantic pit trap for the unwary. Want to add an ID to your transactions to make them idempotent? Well, even though your transactions are now recorded exactly once, that wasn't technically delivery! Gotcha!
The issue is precision in terminology and resulting understanding. At-least-once delivery with idempotency imposes certain rules on the processing of messages. As you say, every distributed systems engineer knows this.
When you call something exactly-once, people who are perhaps not distributed systems engineers make the reasonable assumption that this means exactly what it says. They will engineer around this reasonable assumption based on a clear technical description and get something hilariously broken in non-obvious ways. This will have happened because jargon ("exactly-once delivery") has been confused for a technical description of a delivery system's properties.
Not everyone in this series of comments is a distributed systems engineer. Never mind everyone using a messaging system.
If you zoom out enough then every system looks like a black box and the only thing that matters is inputs and outputs. Messaging systems are no different in that respect from literally everything else. If you're building a messaging system, there's a world of difference in terms of how other people integrate with your software if you say "exactly-once" vs "at-least-once."
Thinking about it, we’re probably violently agreeing with just some differences in terminology.
I’m arguing that exactly-once delivery is possible if the message receipt happens in a single place, but maybe others don’t see that as a distributed system at all.
Potentially messy. Now you have two distributed system messages - the initial notification and the ack.
> I’m arguing that exactly-once delivery is possible if the message receipt happens in a single place, but maybe others don’t see that as a distributed system at all.
I think that only holds if the sender is also in the same place. Otherwise there's the very real chance of a message getting lost, turning your system from exactly-once into at-most-once. At this point your sender, consumer, and messaging systems are one system, so it's probably reasonable to question if that's a distributed system.
When do you record the message as processed? If you do it before the processing is complete, you have at-most-once delivery (because processing could fail after you've recorded it as processed, and it won't be retried). If you do it after the processing is complete, you have at-least-once delivery (because marking the message as processed could fail, and it will be retried).
There is a third option besides idempotency and eliminating side-effects: give each message a unique ID, use that to keep a record of which messages have been processed, and don't process the same message twice.