No, you shouldn't be impressed. It only shows that some cacheing scheme can deliver data this quickly. One byte reads possibly, perhaps the same byte read one million times. Almost nothing to do with MySQL performance in fact.
Historically the overhead of SQL put a low ceiling on the potential performance of simple queries. Which is exactly why KV databases appeared in the first place, because a scalar value get in most RDBMS systems barely hit 1000 results per second, and that is being optimistic. That overhead was never optimized away because the classic use of the RDBMS was query conversations where each interaction was very expensive (e.g. large, complex queries), so optimizing it wasn't important.
But, thanks to competition, most database systems have gotten fantastically more performant. Even using the TDS interface, I can get close to Redis performance with SQL Server now, while it was beaten by one if not two orders of magnitude a few versions ago. And in this case MySQL added the ability to use a much simpler, lighter memcached API in from of MySQL, for obvious reason -- the underlying database system is fast enough that it is of value.
I should add that almost everyone who has commented thus far completely misunderstands what this is taking about. It is the Memcached API atop MySQL. Meaning you can talk to your database server through that API from any client programmed to talk to a memcached server. And that same data that you access via memcached (reading or writing) can be accessed in your normal SQL. Which is pretty cool.
When I see explicit mention of API, it implies there's no running memcache daemon.
I dug MySQL 5.7 manual several times, but I couldn't find any proof that this is just an memcache API atop MySQL.
But I found some proof that says there's a real memcache daemon atop MySQL.
> Running memcached in the same process space as the MySQL server avoids the network overhead of passing requests back and forth.
MySQL 5.7 manual clarifies that they're running memcache daemon inside the database engine in-process mode. So actually real memcache is serving the data atop the engine.
So, if dataset is read-only and small enough, it's likely to happen that whole the data is being served by memcache. In that case, it cannot be slower than memcache.
That's why I couldn't be impressed. The only thing MySQL doing is automatic handling of memcache daemon. That should be convenient, but not impressive.
Maybe I am wrong. If you think I am wrong, please show me some reference that explicitly mentions this is just API, and there's no running memcache instance.
Also, regardless of whatever is really happening under hood, this benchmark is still meaningless. Because database exists to store data and all the hard stuffs are only happening when storing data. Writing transactions usually degrade performance a lot. That's why writing transactions must be included in benchmark to show the real world performance.
>Maybe I am wrong. If you think I am wrong, please show me some reference that explicitly mentions this is just API, and there's no running memcache instance.
Yes, there is a memcached "daemon" in the sense that they run a memcached host within mysql, which they obviously have to do. That memcached "instance" reads and writes from InnoDB and the shared buffer pool also used by the SQL engine: It does not maintain a unique memory instance, and values read or written are atomically consistent with the mySQL instance.
The single and only reason this product exists is that the memcached API is much simpler and network/parsing efficient, allowing for much better round-trip-times when doing simple queries.
OK. You're right. MySQL seems using InnoDB buffer pool. I found the mention.
> You do not need to manually load data into memcached at startup. As particular keys are requested by an application, the values are retrieved from the database automatically, and cached in memory using the InnoDB buffer pool.
So memory-cache is provided the engine itself, not by external plugin.
Could you elaborate on SQL Server vs Redis claim? Or throw some references?
I am not doubting but just honestly asking. I've been discussing a possible Redis solution for complementing our current SQL Server.
Our problem isn't that big, we need to serve 10k req/sec of a simple query, can SQL Server do that? Isn't the connection pool the bottleneck to handle loads of this scale?
we need to serve 10k req/sec of a simple query, can SQL Server do that
Several years ago with SQL Server 2008 R2 I achieved 200,000+ simple queries per second (http://bit.ly/IlH2id -- this is not a regimented benchmark by any measure of the imagination, but is only saying "validate before assuming" because you might find your install performs far better than you anticipate) using the standard TDS query interface. This was on pretty beefy hardware, and is obviously enormously contingent on the data being in memory (which you can force with 2012), however it blew me away and completely undermined an initiative we had to implement AppFabric / Redis or other solutions.
It's hard for me to tell what the MySQL benchmark was really doing, but the postgres one seems a bit more "normal" (ordinary SQL queries from postgres's ordinary simple benchmark tool).
I would have loved if they had benchmarked it against doing them same thing in MySQL with SQL so we could see how much overhead is added by the SQL layer for simple selects.
PostgreSQL manages to reach a third of their QPS in Haas's benchmark.
Huh? First off, the memcached api is talking to InnoDB directly not the other way around. This means that you can dramatically reduce the latency of queries that would previously require you to maintain a bunch of cache and the complexity that comes with it and you would incur the overhead of going to both the memcached server and the database on a cache miss(For complicated queries the you might have to continually hit many cache servers and then the database multiple times, this means only a single connection is needed even in the worsst case that the data isn't in RAM). This also has the advantage that a cache miss doesn't force the SQL layer to come into play just to fetch an item from disk and then have the application stick it into the cache(so your cache is just the page cache in InnoDB; this is much more efficient than application managed memcached).
To be clear, it's the memcached API on top of InnoDB - not the other way around. This is cool because it allows fast access to data in InnoDB, while exposing the same underlying data through SQL, as well as for a bunch of other reasons.
Right but if someone showed a similar improvement by adding enough RAM to prevent swapping, would you be impressed? This is what is going on here. They have very much successfully proven that RAM is faster than HDD.
They have proven that SQL-based storage is faster than NOSQL databases.
The poster of the top comment doesn't understand what this topic is about.
This is actually a benchmark of InnoDB table access. You can query it using usual SQL-commands. But if you want faster access, you can use memcache API.
"Read-only" is also misleading here. This is just a benchmark of get command.
"They have proven that SQL-based storage is faster than NOSQL databases."
Too broad of a statement. 'NoSQL' is an umbrella term, and thus likely has several contenders for a raw QPS metric.
The additional step of parsing the query is obviated by statement caching. You're right though, it's simply a benchmark of how fast the query path is to the InnoDB buffer pool. The work is to optimize internal contention.
The MySQL team decided to run with a more standard memcached interface, and MySQL actually embeds memcached to do this (with MySQL as backend storage).
Despite being read-only, the over 1M number highlights a lot work thats been done for multi-core scalability. This test was on a 48 score machine.. that was a distant dream a few years ago on MySQL 5.1.
1M queries doesn't make sense if most of them actually use Memcached as backend. You can run more than 1M queries like "select 1", it does the same thing. The magic is Memcached.
MySQL 5.7 includes a NoSQL interface, using an integrated memcached daemon that can automatically store data and retrieve it from InnoDB tables, turning the MySQL server into a fast “key-value store” for single-row insert, update, or delete operations. You can still also access the same tables through SQL for convenience, complex queries, bulk operations, application compatibility, and other strengths of traditional database software.
With this NoSQL interface, you use the familiar memcached API to speed up database operations, letting InnoDB handle memory caching using its buffer pool mechanism. Data modified through memcached operations such as ADD, SET, INCR are stored to disk, using the familiar InnoDB mechanisms such as change buffering, the doublewrite buffer, and crash recovery.
==== 8< ====
this is what is being benchmarked here, not pure memcached.
As far as I can tell this is just a a memcached interface to ordinary InnoDB tables, so yes. 1000k queries per second also sounds plausible since stock PostgreSQL can reach almost 400k queries per second with normal SQL queries.
The storage engines of relational databases are this fast (for read-only queries to datasets which fit in memory).
Yes but this doesn't change the fact that the queries use Memcached as the backend. If all of 1M queries hit the InnoDB tables, then I would be impressed. Also note that it's read-only performance, I think MySQL acts like a bridge here for most of the queries. (Correct me if I'm wrong.)
Not necessarily, InnoDB has optimizations for in-memory tables and there isn't any write overhead that would make the greater complexity of the transaction/disk layer come into play.
That's only true when we have any write operation. If all dataset are purely read-only, it's always consistent. And I believe this article is the case.
I think I can safely claim that (with the right team) I can scale any database to match that rate. The real problem with scaling is that until you hit a limit, you don't generally know where the limit is ... and at the limit, you need the system to do something reasonable (in this case, I'd say drop read and/or write requests that couldn't be done consistently). I also wonder if you couldn't perform the same feat with only memcached (wink wink).
Systems that behave erratically when you hit a limit should be avoided, and I often wonder if people publish articles like this because they themselves are amazed it was possible. For a well-behaved system, there's no need for fanfare as you grow, but when you hit a limit you have to engineer a reasonable solution.
As an aside, it's pretty amazing (to some who's first computer had 2K of RAM) how big you can scale an individual server these days. The competition I'd like to see is when another team accomplishes the same feat but with half the memory, or half the number of cores or a result set that's twice as large - and then publishes how they've done it. A little friendly competition might result in some amazing discoveries.
Hmmm… I don't see any meaningful information. Should I be impressed? Am I missing something?