> Even the most unoptimized database should be able to handle this. Anybody had ...

vidarh · on April 11, 2023

I processed ~3m messages a day on Sqlite using a pair of ca 2006 era Xeon on spinning rust (for redundancy; each could easily handle the load by itself). The queue processor was written in Ruby and ran on the (very slow) 1.8.x series (even then it used about 10% of a single core.

On modern hardware you should be able to trivially handle more than that.

randito · on April 11, 2023

Does this mean you are processing messages on only one machine, since it's Sqlite? Depending on what you are processing, that could take longer than the queue/dequeue.

vidarh · on April 11, 2023

The queue was managed by a server written in Ruby that spoke Stomp [1], so while for the biggest queue most of the processing did in fact happen on the same machine, that was just because it happened to fit.

[1] https://stomp.github.io/

polyrand · on April 11, 2023

SQLite is missing some features like `SELECT FOR UPDATE`, but you can work around some issues with a few extra queries. I wrote litequeue[0] with this specific purpose. I haven't been able to use it a lot, so I don't have real-world numbers of how it scales, but the scaling limits depend on how fast you can insert into the database.

[0]: https://github.com/litements/litequeue

st3fan · on April 11, 2023

I wrote https://github.com/TinyWebServices/tqs a couple of years ago. It is modelled after SQS and runs in a single threaded Tornado server.

I don’t know how many messages per second it does but for a podcast crawling side project I have processed hundreds of millions of messages through this little Python wrapper around SQLite. Zero problems. It just keeps running happily.

alberth · on April 11, 2023

Expensify does.

https://bedrockdb.com/jobs.html

https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-q...

matzf · on April 11, 2023

The huey project supports sqlite as one of its backend options (https://huey.readthedocs.io/en/latest/api.html#SqliteHuey). It works, is all I can say.

0xbadcafebee · on April 11, 2023

Why use something as complicated as SQLite? You can use a plain old set of directories and files as a queue, with sane, portable, exclusive, atomic locks, on any filesystem, with concurrent readers/writers. That's how we ran mail servers that handled millions (now billions) of messages a day, 20+ years ago.

Sesse__ · on April 11, 2023

But your durability goes down the drain on power loss or similar. Files are _really_ complicated to get durable.

0xbadcafebee · on April 12, 2023

Sorry, I don't follow? SQLite also has durability problems on power loss. You can use exactly the same method that SQLite uses to flush data to disk. There is no need for a WAL because the queue operations are atomic, using rename() on POSIX filesystems (including NFS). There's no rollback of course, but you can recover items that are stalled in the queue. With a journaled filesystem, fdatasync() after each atomic operation, and battery-backed RAID, I can't imagine it getting more durable [in comparison to SQLite].

Interestingly, journaled rollback mode in SQLite is probably worse durability than the file/directory queue method. (https://www.sqlite.org/lockingv3.html#how_to_corrupt) (https://www.sqlite.org/atomiccommit.html#sect_9_0) It depends on the same system-level guarantees about the filesystem and hardware, except it also depends on POSIX advisory locks, which are fallible and implementation-specific; while the file/directory queue solely depends on rename() being atomic, which it always is.

Sesse__ · on April 13, 2023

> Sorry, I don't follow? SQLite also has durability problems on power loss.

No, it doesn't. Once COMMIT returns, your data is durable.

> You can use exactly the same method that SQLite uses to flush data to disk.

Good luck with that! It took them about ten years to get it right, and the SQLite people are world class. (You can't just copy their code wholesale, since they have one file to worry about and your scheme has tons.)

> With a journaled filesystem, fdatasync() after each atomic operation, and battery-backed RAID, I can't imagine it getting more durable [in comparison to SQLite].

The problem is lack of imagination. :-) See https://danluu.com/deconstruct-files/ (and the linked papers) for an overview of all the things that can go wrong. In particular, if you only only ever fdatasync(), your files are not even guaranteed to show up after a power loss.

mst · on April 12, 2023

Solution: use a qmail install as the queue manager, and a custom mail server it relays to as a job handler.

There's already a spec for transferring SOAP over SMTP so the request/response part is already built for you.

(couldn't resist, sorry)

vidarh · on April 12, 2023

Been there, done that. In production.

That is, without the custom mail server you describe. You can feed every incoming message to a custom command with ".qmail", or forward with ".forward", so we used that for "push" delivery of messages and plain old POP3 for pull. E-mail really does have everything you need to build complex routing topologies.

We ran a mail provider, and so had a heavily customised Qmail setup, and when we needed a queue we figured we might as well use that. Meant we could trivially do things like debugging by cc:'ing messages to a mailbox someone connected a mail client to, for example.

mst · on April 12, 2023

My ISP days were not unsimilar, and had we a TARDIS it might be fun to go back to when both setups were still there and compare notes.

However, I'd hope the SOAP part made it clear I was exaggerating a little for effect :)

... also because the idea of the custom server meaning you could have the main system's qmail handle spaced retries based on SMTP status codes amused me.

vidarh · on April 12, 2023

I mean SOAP fit right in with the zeitgeist of that era... I seem to remember we were sending XML (shudder), as we had quite a lot of other stuff using XML, but nothing as verbose as SOAP.

Our main CGI (yes...) written in C++ (I can hear you squirm from here) was also loosely inspired by CORBA (stop that screaming) in terms of "sort-of" exposing objects via router that automatically mapped URLs to objects, and which used XML for persistence and an XML based rendering pipeline (not that different from React components, actually, except all C++ and server side).

Hit all the 90's buzz words.

mst · on April 13, 2023

Yeah, I had quite a bit of XML flying around, plus the inevitable "all our vendors want to communicate via CSV over FTP", and some customer facing perl CGIs plus a daemon or two I'd written that usually spoke XML over TCP sockets.

Plus lots of NFS for the shared filestore that the qmail SMTP nodes and the courier/qmail-pop3d mail receipt nodes mounted.

Plus ... yeah, you can see why I thought we'd not find each others' setups -too- surprising.

So, no, not going to squirm, because I mean, yes, I know, but it all (mostly) worked and the customers weren't unusually unhappier with us than they are with any provider ;)

nomel · on April 11, 2023

Yeah, I'm super confused with this. Getting a few thousand per second seems relatively trivial, on an Arduino. Maybe there's something I'm missing here, or is this the abstractions that software lives at these days?

TedDoesntTalk · on April 11, 2023

This limits simultaneous writes to the maximum number of open file handles supported by the OS. I don’t know what that is, but I don’t see how it can compare to a multiple multiplexed TCP/IP sockets.

When you’re writing billions of messages per day, I don’t see how a file system scales.

MobiusHorizons · on April 12, 2023

Close the file handle if you aren’t actively processing the file.

TedDoesntTalk · on April 12, 2023

That still allows for perhaps 3000-4000 simultaneous writes, depending on how many file handles are in use by other processes.

vidarh · on April 12, 2023

On Linux, /proc/sys/fs/file-max has the maximum number of simultaneous open filehandles supported by the kernel. On my laptop this is about 1.6 million

Izkata · on April 12, 2023

And mine is about twice that.

But also keep in mind every executable has at minimum its own executable as an open file. Picking a random python process I currently have running, lsof reports it has 41 open *.so files.

vidarh · on April 12, 2023

Yes, but it's also highly unlikely that if you're trying to push transactions per second into that kind of range that you'd be doing it with individual processes per transaction. You'd also be likely to be hitting IO limits long before the number of file descriptors is becoming the issue.

quintes · on April 11, 2023

Yep I’ve used this approach for file transfers/ messaging between two systems. Primitive but get the lock process unlock and move sequence working and it works for that kind of use case

mst · on April 12, 2023

The p3rl.org/Minion job queue has a sqlite backend and I'm aware of it handling what seemed to me a fairly acceptable amount of throughput and the source to https://metacpan.org/pod/Minion::Backend::SQLite looks pretty comfortable to me.

I've not run it myself in production but I would definitely have heard if it went wrong, I think OpenSuSe's OpenQA thing does build worker boxes that way (though of course they'll have fairly slow jobs so it may just be the write throughput doesn't matter).

This being HN, I'd like to point out you can steal the SQL from the source for your own uses if you want to, this just happens to be the example full implementation I'm familiar with.

sodapopcan · on April 11, 2023

Oban [0] is a job queuing system in the Elixir world that supports both postgres and sqlite.

[0] https://getoban.pro/

zamalek · on April 11, 2023

The problem with a queue on SQLite is that every successful read implies a write. SQLite is fast enough that it may not matter, though.

simplotek · on April 11, 2023

> The problem with a queue on SQLite is that every successful read implies a write.

SQLite also supports in-memory databases with shared cache.

https://www.sqlite.org/inmemorydb.html

If a message queue does not require any form of persistence, writes don't sound like an issue.

avinassh · on April 11, 2023

But this is purely in-memory with no disk persistence. That’s quite risky

michaelcampbell · on April 11, 2023

Not to be coy, but it only is if it is. For this application I'd agree, but there are plenty of apps that want queues for in-memory use, and if the thing that holds the memory for the queue dies, chances are the whole ecosystem has.

Sometimes this is fine.

jacobr1 · on April 11, 2023

We used sqlite for a queue at a prior startup - enabling the WAL was critical to maintain disk throughput.

mst · on April 12, 2023

I once saw WAL turned off to make a sqlite queue perform better.

But that was ... gremlins. More than one person tried to figure out wtf was going one and eventually it was better business wise to declare 'gremlins' and everybody involved in the incident has been annoyed about not figuring it out since.

nerpderp82 · on April 12, 2023

I just did a quick test, I am seeing over 1k commits per second when being driven from Python 3.11 on an old macbook pro.

foldr · on April 11, 2023

A simple implementation of a queue in SQL will need to acquire an exclusive lock on the table anyway. Although it's not necessary to use locking at the level of SQL itself: https://news.ycombinator.com/item?id=27482402

TedDoesntTalk · on April 11, 2023

The article mentions row-level locking, not full table locking. Big difference in write performance.

yawaramin · on April 12, 2023

For a client-server RDBMS with writes over the network, sure. For an SQLite database with writes to the local filesystem and an order of magnitude better write performance, not so much.